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A  UNIFIED  MATHEMATICAL  APPROACH  TO  IMAGE  ANALYSIS 


ABSTRACT.  This  report  describes  a  unified  approach  to  the  mathematical  modelling  and 
statistical  analysis  of  pictures.  The  results  reported  here  were  derived  from  the  research 
project  “Nonparametric  Estimation  by  the  Method  of  Sieves’*  supported  by  the  U.S.  Army 
Research  Office,  Contract  DAAG29-83-K-0116  to  Brown  University.  Early  in  the  project, 
the  investigations  focussed  on  the  estimation  of  (properties  of)  digitized  images;  this  focus 
was  a  natural  extension  of  (i)  our  earlier  work  on  the  method  of  sieves  for  regularizing 
statistical  estimates  of  infinite  dimensional  objects  and  (ii)  the  groundwork  that  had  been 
laid  for  General  Pattern  Theory.  The  present  report  presents  the  Bayesian  paradigm  for 
mathematical  modeling  and  analysis  of  digital  images  and  then  describes  four  instances  of 
the  paradigm  in  detail.  Directions  for  ongoing  and  future  research  are  also  indicated. 

1.  INTRODUCTION 

Computational  image  analysis  encompasses  a  variety  of  applications  involving  a  sensing 
device,  a  computer,  and  software  for  restoring  and  possibly  interpreting  the  sensed  data. 
Most  commonly,  visible  light  is  sensed  by  a  video  camera  and  converted  to  an  array  of  mea¬ 
sured  light  intensities,  each  element  corresponding  to  a  small  patch  in  the  scene  (a  picture 
element,  or  pixel).  The  image  is  thereby  digitized,  and  this  format  is  suitable  for  computer 
analysis.  In  some  applications,  the  sensing  mechanism  responds  to  other  forms  of  light, 
such  as  in  infrared  imaging  where  the  camera  is  tuned  to  the  invisible  part  of  the  spec¬ 
trum  neighboring  the  ccfior  red.  Infrared  light  is  emitted  in  proportion  to  temperature,  and 
thus  infrared  imaging  is  suitable  for  detecting  and  analyzing  the  temperature  profile  of  a 
scene.  Applications  include  automated  inspection  in  industrial  settings,  medical  diagnosis, 
and  targeting  and  tracking  of  military  objects.  In  single  photon  emission  tomography,  as  a 
diagnostic  toed,  individual  photons,  emitted  from  a  radiopharmaceutical  (isotope  combined 
with  a  suitable  pharmaceutical)  are  detected.  The  objective  is  to  reconstruct  the  distri¬ 
bution  of  isotope  density  inside  the  body  from  the  externally-collected  counts.  Depending 
on  the  pharmaceutical,  the  isotope  density  may  correspond  to  local  blood  flow  (perfusion) 
or  local  metabolic  activity.  Other  applications  of  computer  vision  include  satellite  imaging 
for  weather  and  crop  yield  prediction,  radar  imaging  in  military  applications,  ultrasonic 
imaging  for  industrial  inspection  and  a  host  of  medical  applications,  and  there  is  a  growing 
role  for  video  imaging  in  robotics. 

The  variety  of  applications  has  yielded  an  equal  variety  of  algorithms  for  restoration 
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and  interpretation.  Unfortunately,  few  general  principals  have  emerged  and  no  common 
foundation  has  been  layed.  Algorithms  are  by  and  large  ad  hoc;  they  are  typically  dedicated 
to  a  single  application,  and  often  critically  tuned  to  the  particulars  of  the  environment 
(lighting,  weather  conditions,  magnification,  and  so- on)  in  which  they  are  implemented. 
It  is  likely  that  a  coherent  theoretical  framework  would  support  more  robust  and  more 
powerful  algorithms.  A  well-studied  candidate  is  regulrriution  theory  (see  (45]  and  [52], 
and  references  therein),  which  has  been  successfully  applied  to  a  variety  of  vision  tasks.  We 
have  been  exploring  a  related  approach  based  upon  probabilistic  image  models,  well-defined 
principals  of  inference,  and  a  Monte  Carlo  computation  theory.  Exploiting  this  framework, 
we  have  recently  obtained  encouraging  results  in  several  areas  of  application,  including 
tomography,  texture  analysis,  and  scene  segmentation. 

In  the  section  that  follows  (§2)  we  lay  out,  briefly,  our  paradigm  in  its  general  for¬ 
mulation.  Then,  in  each  of  $3,  $4,  and  $5,  we  discuss  an  area  of  application,  to  texture 
segmentation  and  classification,  boundary  detection,  and  single  photon  emission  tomogra¬ 
phy,  respectively.  In  each  application  section  we  summarise  our  previous  work  and  identify 
directions  for  further  research.  In  §6  we  describe  extensions  of  the  methodology  to  more 
ambitious,  high-level,  vision  problems.  This  entails  extending  our  probability  image  models 
to  accommodate  more  global,  geometric,  image  attributes. 

2.  BAYESIAN  PARADIGM 

In  real  scenes,  neighboring  pixels  typically  have  similar  intensities;  boundaries  are  usually 
smooth  and  often  straight;  textures,  although  sometimes  random  locally,  define  spatially 
homogeneous  regions;  and  objects,  such  as  grass,  tree  trunks,  branches  and  leaves,  have 
preferred  relations  and  orientations.  Our  approach  to  picture  processing  is  to  articulate  such 
regularities  mathematically,  and  then  to  exploit  them  in  a  statistical  framework  to  make 
inferences.  The  regularities  are  rarely  deterministic;  instead,  they  describe  correlations  and 
likelihoods.  This  leads  us  to  the  Bayesian  formulation,  in  which  prior  expectations  are 
formally  represented  by  a  probability  distribution.  Thus  we  design  a  distribution  (a  prior) 
on  relevant  scene  attributes  to  capture  the  tendencies  and  constraints  that  characterize  the 
scenes  of  interest.  Picture  processing  is  then  guided  by  this  prior  distribution,  which,  if 
properly  conceived,  enormously  limits  the  plausible  restorations  and  interpretations. 

The  approach  involves  five  steps,  which  we  shall  briefly  review  here  (see  (22]  and 
[33]  for  more  details).  This  will  define  the  general  framework,  and  then,  in  the  following 
three  sections,  we  will  concentrate  on  the  analysis  of  texture,  boundary  detection,  and 
tomography  as  illustrative  applications. 
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IMAGE  MODEL.  This  is  a  probability  distribution  on  relevant  image  attributes.  Both 
far  reasons  of  mathematical  and  computational  convenience,  we  use  Markov  random  fields 
(MRFs)  as  prior  probability  distributions.  Let  us  suppose  that  we  index  all  of  the  relevant 
attributes  by  the  index  set  5.  The  set  5  is  application  specific.  It  typically  includes  indices 
for  each  of  the  pixels  (about  512  x  512  in  the  usual  video  digitization)  and  may  have  other 
indices  for  such  attributes  as  boundary  dements,  texture  labels,  object  labels  and  soon. 
Associated  with  each  site  s  €  5  is  a  real-valued  random  variable  Xt,  representing  the  state 
of  the  corresponding  attribute.  Thus  X,  may  be  the  measured  intensity  at  pixd  s  (typically, 
X,  €  (0,  ...255}),  or  simply  1  or  0  as  a  boundary  dement  at  location  s  is  present  or  absent. 

The  kind  of  knowledge  we  represent  by  the  prior  distribution  is  usually  local,  which  is 
to  say  that  we  articulate  regularities  in  terms  of  small  local  collections  of  variables.  In  the 
mid,  this  leads  to  a  distribution  on  X  =  {A«}«£S  with  a  more  or  less  local  ndghborhood 
structure  (again,  we  refer  to  [22]  and  [33]  for  details).  Specifically,  our  priors  are  Markov 
random  fldds:  there  exists  a  (symmetric)  neighborhood  relation  G  =  {G,},€s,  wherein 
G,  Q  S  is  the  set  of  neighbors  of  s,  such  that 

II(X.  =  xt\Xr  =  *r ,r  €  S,r  ^  a)  =  H(X,  =  x.\Xr  =  *„re  G.) 

II(a|h)  is  conditional  probability,  and,  by  convention,  s  £  G,.  G  symmetric  means  s  e 
Gr  <*  r  €  G,.  (Here,  we  assume  that  the  range  of  the  random  vector  X  is  discrete;  there 
are  obvious  modifications  for  the  continuous  or  mixed  case.) 

It  is  well  known,  and  very  convenient,  that  a  distribution  II  defines  a  MRF  on  5  with 
ndghborhood  relation  G  if  and  only  if  it  is  Gibbs  with  respect  to  the  same  graph,  (S,G). 
The  latter  means  that  n  has  the  representation 


(2.1) 

n(z)=  ^  exp{-l/(x)} 

where 

(2.2) 

f/(x)=  £Vc(z) 

c€C 


C  is  the  collection  of  all  cliques  in  ( S,G )  (a  clique  is  a  collection  c  of  sites  such  that  every 
two  rites  in  c  are  ndghbors),  and  Ve(x)  is  a  function  depending  only  on  (x,},€c.  U  is  known 
as  the  energy,  and  has  the  intuitive  property  that  the  low  energy  states  are  the  more  likely 
states  under  II.  The  normalizing  constant,  z,  is  known  as  the  partition  function.  The  Gibbs 
distribution  arises  in  statistical  mechanics  as  the  equilibrium  distribution  of  a  system  with 
energy  function  U . 
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As  a  simple  example  (too  simple  to  be  of  much  use  for  real  pictures)  suppose  the 
pixel  intensities  are  known,  a  priori,  to  be  one  of  two  levels,  minus  one  (black)  or  plus  one 
(white).  Let  S  be  the  N  x  N  square  lattice,  and  let  G  be  the  neighborhood  system  that 
corresponds  to  nearest  horizontal  and  vertical  neighbors: 

o  —  o  ••• 

I  I 

o  —  o  ... 

I  I 

o  —  o  —  o  •  •  • 

For  picture  processing,  think  of  N  as  typically  512.  Suppose  that  the  only  relevant  regularity 
is  that  neighboring  pixels  tend  to  have  the  same  intensities.  An  energy  consistent  with  this 
regularity  is  the  Ising  potential: 

U(x)  -  -0  ^  ***«»  P  >  0 

M 

where  £|S|<|  means  summation  over  all  neighboring  pairs  s,t  G  S.  The  minimum  of  U  is 
achieved  when  x,  =  xt,  Vs,i  €  S.  Under  (2.1),  the  likely  pictures  are  therefore  the  ones 
that  respect  our  prior  expectations;  they  segment  into  regions  of  constant  intensities.  The 
larger  0,  the  larger  the  typical  region.  Later  we  will  discuss  the  issue  of  estimating  model 
parameters  such  as  0.  (With  energy  function  above,  II  in  (2.1)  is  called  the  Ising  model.  It 
models  the  equilibrium  distribution  of  the  spin  states  of  the  atoms  in  a  ferromagnet.  These 
spins  tend  to  align  themselves  with  each  other  and  hence  the  favored  configurations  contain 
connected  regions  of  constant  spins.) 

One  very  good  reason  for  using  MRF  priors  is  their  Gibbs  representations.  Gibbs 
distributions  are  characterized  by  their  energy  functions,  and  these  are  more  convenient 
and  intuitive  for  modelling  than  working  directly  with  probabilities.  See,  for  example,  [20], 
[22],  [24],  [33],  and  [45]  for  many  more  examples,  and  §3,  §4,  and  §5  below  for  more  complex 
and  useful  MRF  models. 

DEGRADATION  MODEL.  The  image  model  is  a  distribution  II(-)  on  the  vector  of 
image  attributes  X  =  By  design,  the  components  of  this  vector  contain  ail  of  the 

relevant  information  for  the  image  processing  task  at  hand.  Hence,  the  goal  is  to  estimate 
X.  This  estimation  will  be  based  upon  partial  or  corrupted  observations,  and  based  upon 
the  image  model,  i.e.,  the  prior  distribution.  In  emission  tomography,  X  represents  the 
spatial  distribution  of  isotope  in  a  target  region  of  the  body.  What  is  actually  observed  is 
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a  collection  of  photon  counts  whose  probability  law  is  Poisson,  with  a  mean  function  that 
is  an  attenuated  Radon  transform  of  X.  In  the  texture  labelling  problem,  X  is  the  pixel 
intensity  array  combined  with  a  corresponding  array  of  texture  labels.  Each  label  gives  the 
texture  type  of  the  associated  pixel.  The  observation  is  only  partial:  we  observe  the  pixels, 
which  are  just  the  digitized  picture,  but  not  the  labels.  The  purpose  is  then  to  estimate  the 
labels  from  the  picture. 

The  observations  are  related  to  the  image  process  X  by  a  degradation  model.  This 
models  the  relation  between  X  and  the  observation  process,  say  Y  =  {Y, },€t-  For  texture 
analysis,  we  will  define  X  =  ( XP,XL ),  where  Xp  is  the  usual  grey-level  pixel  intensity 
process,  and  XL  is  an  associated  array  of  texture  labels.  The  observed  picture  is  just 
Xp ,  and  hence  Y  =  Xp:  the  degradation  is  a  projection.  More  typically,  the  degradation 
involves  a  random  component,  as  in  the  tomography  setting  where  the  observations  are 
Poisson  variables  whose  means  are  related  to  the  image  process  X.  A  simpler,  and  widely 
studied  (if  unrealistic),  example  is  additive  white  noise.  Let  X  =  {X,}<€s  be  just  the  basic 
pixel  process.  In  this  case  T  =  S,  and  for  each  s  6  S  we  observe 

Y.  =  X.  +  f). 

where,  for  example,  {t7,},€S  is  Gaussian  with  independent  components,  having  means  0 
and  variances  a1,  and  {»?,}  is  independent  of  the  X-process. 

Formally,  the  degradation  model  is  a  conditional  probability  distribution,  or  density, 
for  y  given  X:  II(y|z).  If  the  degradation  is  just  additive  white  noise,  as  in  the  above 
example,  then 

/  I  \  I*!/2  i 

n(l,|l,  =  (w)  “»<-£>  £<*-*•>> 

For  labelling  textures,  the  degradation  is  deterministic;  II(y|z)  is  concentrated  on  y  =  xp, 
where  x  —  (xp,xL)  has  both  pixel  and  label  components. 

POSTERIOR  DISTRIBUTION.  This  is  the  conditional  distribution  on  the  image  pro¬ 
cess  X  given  the  observation  process  Y.  This  posterior  or  a  posteriori  distribution  contains 
the  information  relevant  to  the  image  restoration  or  image  analysis  task.  Given  an  ob¬ 
servation  Y  =  y,  and  assuming  the  image  model  (II(x))  and  degradation  model  (II(y|z)), 
the  posterior  distribution  reveals  the  likely  and  unlikely  states  of  the  “true”  (unobserved) 
image  X.  Having  constructed  X  to  contain  all  relevant  image  attributes,  such  as  locations 
of  boundaries,  labels  of  objects  or  textures,  and  so  on,  the  posterior  distribution  comes  to 
play  the  fundamental  role  in  our  approach  to  image  processing. 
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The  posterior  distribution  is  easily  derived  from  Bayes’  rule 

H(*|y)  = 

The  denominator,  H(y),  is  difficult  to  evaluate.  It  derives  from  the  prior  and  degrada¬ 
tion  models  by  integration:  II(y)  =  /  n(y|x)II(dx),  but  the  formula  is  computationally 
intractable.  Happily,  our  analysis  of  the  posterior  distribution  will  require  only  ratios,  not 
absolute  probabilities.  Since  y  is  fixed  by  observation,  l/II(y)  is  a  constant  that  can  be 
ignored  (see  paragraph  below  on  Computing). 


n(yj»)n(x) 

n(y) 


As  an  example  we  consider  the  simple  Ising  model  prior,  with  observations  corrupted 
by  additive  white  noise.  Then 

H(x)  =  -  exp{/3  ^  x,  xt} 

2  [Ml 


and 

/  1  \ 151/2  1 

n(s|l)  =  (wO  “p{-2^  -  *->’> 

The  posterior  distribution  is  then 

n(x|y)  =  exp{/J  ^  *«**  “  “  x«)’} 

p  M 

We  denote  by  zp  the  normalizing  constant  for  the  posterior  distribution.  Of  course,  it 
depends  on  y,  but  the  latter  is  fixed.  Notice  that  the  posterior  distribution  is  again  a  MRF. 
In  the  case  of  additive  white  noise,  the  neighborhood  system  of  the  posterior  distribution 
is  that  of  the  prior,  and  hence  local. 


For  a  wide  class  of  useful  degradation  models,  including  combinations  of  blur,  additive 
or  multiplicative  colored  noise,  and  a  variety  of  nonlinear  transformations,  the  posterior 
distribution  is  a  MRF  with  a  more  or  less  local  graph  structure.  This  is  convenient  for  our 
computational  schemes,  as  we  shall  see  shortly.  We  should  note,  however,  that  exceptions 
occur.  Indeed,  nonlocal  graph  structures  that  incorporate  long-range  interactions  are  useful 
and  are  completely  consistent  with  the  Bayesian  paradigm.  In  tomography,  for  example,  the 
posterior  distribution  is  associated  with  a  highly  non-local  graph.  This  particular  situation 
incurs  a  high  computational  cost  (see  [24]  and  §5  below  for  more  details)  as  a  consequence 
of  each  site  in  the  graph  having  a  high  degree. 

ESTIMATING  THE  IMAGE.  In  our  framework,  image  processing  amounts  to  choosing 
a  particular  image  x,  given  an  observation  Y  =  y.  One  choice  is  the  maximum  a  posteriori, 
or  MAP  estimate: 


choose  x  to  maximize  II(x|y) 


The  MAP  estimate  chooses  the  most  likely  x,  given  the  observation.  In  many  applications, 
our  goal  is  to  identify  the  MAP  estimate,  or  a  suitable  approximation.  Often,  though,  other 
estimators  are  more  appropriate  (see  Besag  [4}  for  an  insightful  discussion).  We  have  found, 
for  example,  that  the  posterior  mean  (/  xll(dx\y))  is  more  effective  for  tomography,  at  least 
in  our  experiments  (see  §5). 

The  computational  issues  for  posterior  mean  and  MAP  estimators  are  similar.  For 
illustration  we  concentrate  here  on  MAP  estimation.  In  most  applications  we  cannot  hope 
to  identify  the  true  maximum  a  posteriori  image  vector  x.  To  appreciate  the  computational 
difficulty,  consider  again  the  Ising  model  with  added  white  noise: 

(2-3)  II(*|y)  =  j-  exp {/?  x,xt  -  ~  ]T(y,  -  a:,)2} 

9  (Ml  *65 

This  is  to  be  maximized  over  all  possible  vectors  x  =  {x,}»gs  G  {-1, 1}S.  With  |S|  ~  10s, 
brute  force  approaches  are  intractable;  instead,  we  will  employ  a  Monte  Carlo  algorithm 
which  gives  adequate  approximations.  (Remarkably,  this  particular  example  can  be  solved 
exactly,  using  techniques  from  network  flow  theory,  see  [29].) 

Maximizing  (2.3)  amounts  to  minimizing 


which  might  be  thought  of  as  the  posterior  energy.  (As  with  zp,  the  fixed  observation  y  is 
suppressed  in  the  notation  Up(x).)  More  generally,  we  write  the  posterior  distribution  as 

(2.4)  7-exp{-ffp(x)} 

zp 

and  characterize  the  MAP  estimator  as  the  solution  to  the  problem 

choose  x  to  minimize  Up(x) 

The  utility  of  this  point  of  view  is  that  it  suggests  a  further  analogy  to  statistical  mechan¬ 
ics,  and  a  computation  scheme  for  approximating  the  MAP  estimate,  which  we  shall  now 
describe. 

COMPUTING.  When  exploring  a  specific  application,  it  is  our  consistent  experience 
that  computation-intensive  algorithms  for  image  analysis  can  be  made  fast  by  suitable 
compromises  and  exploitation  of  special  structure.  Still,  as  a  research  tool,  it  has  been 
invaluable  to  have  available  a  general  computational  framework,  which  we  now  describe  in 
the  context  of  MAP  estimation. 
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Pretend  that  (2.4)  is  the  equilibrium  Gibbs  distribution  of  a  real  system.  Recall  that 
MAP  estimation  amounts  to  finding  a  minimal  energy  state.  For  many  physical  systems 
the  low-energy  states  are  the  most  ordered,  and  these  often  have  desirable  properties.  The 
state  of  silicon  suitable  for  wafer  manufacturing,  for  example,  is  a  low-energy  state.  Physical 
chemists  achieve  low-energy  states  by  heating  and  then  slowly  cooling  a  substance.  This 
procedure  is  called  annealing.  Cerny  [10]  and  Kirkpatrick  [40]  suggest  searching  for  good 
minimizers  of  U(-)  by  simulating  the  dynamics  of  annealing,  with  U  playing  the  role  of 
energy  for  an  (imagined)  physical  system.  In  our  image  processing  experiments,  we  often 
use  simulated  annealing  to  find  an  approximation  to  the  MAP  estimator. 

Dynamics  are  simulated  by  producing  a  Markov  chain,  X(1),X(2),...  with  transition 
probabilities  chosen  so  that  the  equilibrium  distribution  is  the  posterior  (Gibbs)  distribution 
(2.4).  One  way  to  do  this  is  with  the  Metropolis  algorithm  [47].  More  convenient  for  image 
processing  is  a  variation  we  call  stochastic  relaxation.  The  full  story  can  be  found  in  [22]  and 
[33].  Briefly,  in  stochastic  relaxation  we  choose  a  sequence  of  site3  s(l),s(2),...  €  S  such 
that  each  site  in  5  is  visited  infinitely  often.  If  X(t)  =  x,  say,  then  Xr(t  +  1)  =  xr,  Vr  / 
s(t),  r  G  5,  and  X^t)(t  +  1)  is  a  sample  from 

n(^0  =  .|Xr  =  xr,r/s(t)), 

the  conditional  distribution  on  X,(t),  given  Xr  =  xr,  Vr  /  s(t).  By  the  Markov  property, 


II(X.(0  =  \Xr  =  zr,r  *  s(t))  =  n(^t)  =  -\Xr  =  xP,r  G  G't)) 

where  {G$}f€s  *8  the  posterior  neighborhood  system,  determined  by  the  posterior  energy 
Up(-).  The  prior  distributions  that  we  have  experimented  with  have  mostly  had  local  neigh¬ 
borhood  systems,  and  usually  the  posterior  neighborhood  system  is  also  more  or  less  local 
as  well.  This  means  that  is  small,  and  this  makes  it  relatively  easy  to  generate, 

Monte  Carlo,  X(t  +  1)  from  X(t).  In  fact,  if  fl  is  the  range  of  then 


n(*,(t)  =  a\Xr  =  xr,r  G  G '  )  =  t 


where 


f  a  r  =  s(t) 

(a,,(i)X)r  =  < 

l*P  ryis(t) 

Notice  that  (fortunately!)  there  is  no  need  to  compute  the  posterior  partition  function 
zp.  Also,  the  expression  on  the  right-hand  side  of  (2.5)  involves  only  those  potential  terms 
associated  with  cliques  containing  s(t),  since  all  other  terms  are  the  same  in  the  numerator 
and  the  denominator. 
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To  simulate  annealing,  we  introduce  an  artificial  temperature  into  the  posterior  dis¬ 
tribution: 

_  exp {-U,(x)IT} 

As  T  -*  0,  IIT(-)  concentrates  on  low  energy  states  of  Up.  To  actually  find  these  states, 
we  run  the  stochastic  relaxation  algorithm  while  slowly  lowering  the  temperature.  Thus 
T  =  T(t),  and  T(t )  j  0.  IIt(<)(')  replaces  II(-)  in  computing  the  transition  X(t)  — *■  X(t  + 1). 
In  [22]  we  showed  that,  under  suitable  hypotheses  on  the  sequence  of  site  visits,  s(l),  s(2), 

If  T(t )  >  c/(l  +  log(l  +  t)),  and  T(t)  J,  0,  then  for  all  c  sufficiently  large  X(t)  con¬ 
verges  weakly  to  the  distribution  concentrating  uniformly  on  {x  :  U(x)  =  minyU(y)}. 

More  recently,  our  theorem  has  been  improved  upon  by  many  authors.  In  particular, 
the  smallest  constant  c  which  guarantees  convergence  of  the  annealing  algorithm  to  a  global 
minimum  can  be  specified  in  terms  of  the  energy  function  Up  (see  [27]  and  [37]).  Also,  see 
Gidas  [28]  for  some  ideas  about  faster  annealing  via  renormalization  group  methods. 

In  the  experiments  to  be  described  here,  MAP  estimates  are  approximated  by  using 
the  annealing  algorithm  or  deterministic  variations  of  annealing.  This  involves  Monte  Carlo 
computer  generation  of  the  sequence  A"(1),A(2),. . .,  terminating  when  the  state  ceases  to 
change  substantially. 

3.  TEXTURE  SEGMENTATION  AND  CLASSIFICATION 

Texture  synthesis  refers  to  computer  generation  of  homogeneous  patterns,  usually  intended 
to  match  a  natural  texture  such  as  wood,  grass,  or  sand.  In  many  instances,  Markov  random 
fields  provide  good  models,  and  Metropolis-like  Monte  Carlo  methods  yield  respectable 
facsimiles  of  the  real  textures  ([15], [17]).  Here  we  combine  MRF  texture  models,  for  the 
pixel  process,  with  an  Ising-like  texture-label  process,  in  order  to  segment  and  label  a  scene 
consisting  of  patches  of  natural  textures.  The  image  model  thereby  involves  both  a  pixel 
process,  of  grey-level  intensities,  and  a  label  process,  whose  components  identify  the  texture 
type  of  each  picture  element  in  the  scene.  Our  approach  is  similar  to  those  of  Derin  and 
Elliott  [17]  and  Cohen  and  Cooper  [14],  especially  in  our  use  of  this  two-tiered  image  model. 

IMAGE  MODEL.  The  image  process  comprises  a  pixel  process  and  a  label  process, 
X  =  {XP,XL}.  As  usual,  the  pixel  sites  form  an  N  x  N  square  lattice,  say  Sp.  For 
each  pixel  site  there  is  a  corresponding  label  site,  and  thus  the  graph  associated  with  the 
image  model  has  sites  S  =  Sp  U  SL,  where  SL  is  just  a  copy  of  Sp.  The  elements  of  Sp 
and  SL  index  the  components  of  Xp  and  XL,  respectively,  so  that  Xp  =  {A^}365p  and 


XL  =  {Xa}a€Si .  In  the  experiments  reported  here,  the  pixels  were  allowed  sixteen  possible 
grey  levels  Xp  £  {0,1,. ..,15}, Vs  £  Sp,  whereas  the  range  of  the  labels  depended  on  the 
actual  number  of  textures  in  the  scene,  thus  assuming  this  number  to  be  known  a  priori.  Let 
M  be  the  number  of  textures  that  are  to  be  modelled.  Then  X a  £  {1,2,...,  M},  Vs  £  SL. 

We  shall  develop  the  image  model  by  first  assuming  that  the  texture  type  is  fixed, 
say  type  l  and  constant  over  the  scene.  Conditioned  on  Xa  =  l  £  {1,2,...,  M},  Vs  £  SL, 
the  process  Xp  is  a  Markov  random  field: 

n(xP|Xf'  =  l,s  £  SL)  =  ±exp{-U(‘\xp)} 
where  is  the  usual  normalizing  constant 

z<‘>  =  £ex p{-tf(,>(ip)} 
xP 

Only  pair-cliques  appear  in  the  energy  There  are  six  types  of  pair-cliques,  as  shown 
in  Figure  1.  These  we  index  by  i  £  {1,2, 3, 4, 5, 6}.  We  denote  by  <  s,t  >,  a  pair  of  sites 
s,t  which  form  a  type  i  clique,  and  by  t>.  the  summation  over  all  such  pairs.  With 
these  conventions,  the  (conditional)  energy  is 

(3.1)  UV\xp)  =-£  Y,  ~  *f).  *(*)  =  (1  +  d^l/5)2)-1 

for  some  fixed  6  >  0.  Notice  that  $(if  -  xp)  is  larger  when  if  =  if,  and  is  monotonic 
in  |if  —  if  |.  Because  of  this,  the  texture-dependent  parameters  ...,0g^  determine  the 
degree  to  which  neighboring  pixels,  of  a  particular  type  of  pair-clique,  will  tend  to  have 
similar  grey  levels.  In  fact,  if  6\1^  >  0,  then  for  texture  l  we  expect  pixel  pairs  i4  and  xt , 
of  clique  type  i,  to  typically  have  similar  intensities.  If  <  0  then  the  tendency  is  to 
be  different.  Of  course,  these  simple  rules  are  complicated  by  the  actions  of  the  other  five 
types  of  pair-cliques. 

o  o  o  o 
0  0 
° 

O  V)  o  o 

Figure  1. 

The  parameters  for  t  =  1,2,. ..,6,  and  l  =  1,2, ...,M  are  estimated  from  pictures 
of  the  M  textures,  as  explained  in  the  subsection  on  Parameter  Estimation,  below.  On  the 


other  hand,  $,  and  indeed  the  neighborhood  structure,  is  ad  hoc.  We  have  used  $  exten¬ 
sively  in  other  applications  in  which  our  main  concern  is  with  the  difference  of  intensities 
between  neighboring  pixels.  Of  course  the  quadratic,  $(A)  =  A2,  is  simpler,  but  it  unduly 
penalizes  large  differences. 


Having  modeled  the  M  textures,  we  now  construct  a  composite  Markov  random  field 
which  accounts  for  both  texture  labels,  XL  =  {Xa,  s  €  SL),  and  grey  levels,  Xp  — 
{Xp,  s  £  Sp}.  The  joint  distribution  is 


U(XP  =  xp,XL  =  xL)  = 


L  exp{—Ui(xp,xL)  -  U2(xl)} 


in  which  U2  promotes  label  bonding  (we  expect  the  textures  to  appear  in  patches  rather  than 
interspersed)  and  U\  specifies  the  interaction  between  labels  and  intensities.  Specifically, 
we  employ  a  simple  Ising-type  potential  for  the  labels: 

(3.3)  U2(xL)  =  -fi  £  1xl—sl  +  £  tu(sf'),  fi  >  0 

I*,i] 

Here  (3  determines  the  degree  of  clustering,  [s,t]  indicates  a  pair  of  nearest  horizontal  or 
vertical  neighbors,  and  w(-)  is  adjusted  to  eliminate  bias  in  the  label  probabilities  (more  on 
the  choice  of  w(-)  later). 


To  describe  the  interaction  between  labels  and  pixels  we  introduce  the  symbols 
ti  ,  72 , . . . ,  re  to  represent  the  lattice  vectors  associated  with  the  6  pair-cliques  (Figure  1). 
Thus  s  and  s  +  Tj  are  neighbors,  constituting  a  pair  with  clique  type  t.  The  interaction  is 
then  given  in  terms  of  pixel-based  contributions, 


H(xp,l,s)  =  -  {$(*f  -  *f+Ti)  +  *(*f  “ 


and  local  sums  of  these  called  block-based  contributions, 


Z(xp,l,3)=lj^H(xp,l,t)- 


Here,  Na  is  a  block  of  sites  centered  at  s  (5  x  5  in  all  of  our  experiments),  and  the  constant 
a  is  adjusted  so  that  the  sum  of  all  block-based  contributions  reduces  to  (see(3.1)): 

(3.6)  U^(xp)  =  ^Z(xp,l,s) 

>es 

This  amounts  to  ensuring  that  each  pair-clique  appears  exactly  once  (a  =  50,  for  example, 
when  Nt  is  5  by  5).  In  terms  of  (3.4)  and  (3.5),  the  interaction  energy,  Ui(xp,xL),  is 
written 


Ui(xp,xL )  =  Z(xp,x^,s ). 

»es 


> 


Because  of  (3.6),  the  model  is  consistent  with  (3.1)  for  homogeneous  textures,  =  I,  Vs  € 
5.  The  idea  is  that  each  local  texture  label,  X is  influenced  by  the  pixel  grey  levels  in  a 
neighborhood  of  s. 

Finally,  to  clarify  the  bias  correction  term  tu(-),  we  briefly  examine  the  local  char¬ 
acteristics  of  the  field,  specifically  the  conditional  distributions  for  the  labels  given  all  the 
intensity  data  and  the  values  of  the  neighboring  labels.  (The  actual  neighborhoods  of  the 
Markov  random  field  corresponding  to  (3.2)  can  be  easily  inferred  from  (3.3)  and  (3.7).) 
The  log  odds  of  texture  type  k  to  type  j  is 

f  n(x*  =  k\xj  =  xj,  *  #  r ;  xp  =  »f,  *eS)] 

glnW=M  =  zf,^r;  Xf  =  Xr,seS)J 

=  Z(xp,j,r)-Z(xp,k,r)  +  /1  £(lrf=/fc  -  lxf=i)  +«(;)-«(*) 

=  ;EE  rik)  -  o'.'1)  {*(*?  -  *f+„) + *(*r  - 

i=l 

+ 0  Yl  +  "0*)  -  w(*) 

The  first  term  imposes  fidelity  to  the  data  xp,  and  the  second  bonds  the  labels.  The  efficacy 
of  the  model  depends  on  the  extent  to  which  the  first  term  separates  the  two  types  k  and  j, 
which  can  be  assessed  by  plotting  histograms  for  the  values  of  this  quantity  both  for  pure 
k  and  pure  j  data.  A  clean  separation  of  the  histograms  signifies  a  good  discriminator. 
However,  since  we  are  looking  at  log  odds,  we  insist  that  the  histograms  straddle  the  origin, 
with  positive  (resp.  negative)  values  associated  with  texture  type  k  (resp.  j).  The  function 
w(-)  makes  this  adjustment. 

DEGRADATION  MODEL.  The  degradation  is  deterministic.  The  observation  process 
is  the  pixel  process  Y  =  Xp ,  and  hence  the  degradation  is  just  the  projection  (XP,XL)  — ► 
Xp. 

POSTERIOR  DISTRIBUTION.  In  this  special  case,  the  posterior  energy  is  the  same 
as  the  prior  energy,  but  some  of  the  components  are  fixed.  In  particular 

II  ((xp,xL)|y)  =  —  exp{-l/i(ip,iL)-£/2(*t)}lIe,=y 
zp 

Equivalently,  we  simply  use  \[(xL\xp)  as  the  posterior  distribution: 

II(zL|ip)  =  —  exp  {-t/1(zp,xL)  -  t/2(zL)} 
zv 
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ESTIMATING  THE  IMAGE.  We  use  MAP  estimation:  Given  an  observation,  Xp  = 
xp,  we  seek  xL  to  minimize  U\(xp,xL)  +  Uj(xL). 


COMPUTING.  We  use  stochastic  relaxation,  with  simulated  annealing,  as  described  in 
$2.  A  convenient  starting  point  is  arrived  at  by  "turning  off”  the  Ising  term  in  the  label 
model  (3.3):  we  set  =  0.  Since  this  is  the  only  label/label  interaction  term  in  the  model, 
the  MAP  estimate  of  xL,  with  /?  =  0,  is  determined  by  (locally)  optimizing  xf'  at  each 
s  6  SL.  The  computation  time  is  negligible.  Thereafter,  we  set  /?  to  the  model  value  (see 
the  subsection  on  Parameter  Estimation,  below)  and  begin  stochastic  relaxation.  In  the 
experiments,  each  site  was  visited  about  150  times. 

EXPERIMENTAL  RESULTS.  Three  experiments  were  done  on  texture  discrimination, 
based  on  two  images  with  two  textures  each  and  one  with  four.  There  are  four  textures 
involved:  wood,  plastic,  carpet,  and  cloth.  As  mentioned  above,  the  parameters  were  es¬ 
timated  from  the  pure  types  (see  the  subsection  on  Parameter  Estimation,  below).  There 
was  no  pre-  or  post-processing.  In  particular,  no  effort  was  made  to  "clean-up”  the  bound¬ 
aries,  expecting  smooth  transitions.  The  results  are  shown  in  Figures  2,  3,  and  4;  these 
correspond  to  i)  wood  on  plastic,  ii)  carpet  on  plastic,  and  iii)  wood,  carpet,  and  cloth  on 
plastic  background. 

FIGURES  2,3,4  ARE  PLACED  AT  THE  END  OF  THE  TEXT. 


In  each  figure,  the  left  panel  is  the  textured  scene,  and  the  right  panel  shows  the 
segmentation,  with  texture  labels  coded  by  grey  level.  It  is  interesting  to  note  that  the  grey 
level  histograms  of  the  four  textures  are  very  similar  (Figure  5);  in  particular,  discrimination 
based  on  shading  alone  is  virtually  impossible. 

The  model  is  not  really  adequate  for  texture  synthesis,  samples  generated  from  the 
model  do  not  resemble  the  texture  very  well.  Evidently,  the  utility  of  Markov  random 
field  models  does  not  depend  on  their  capacity  for  simulating  real-world  imagery.  A  more 
serious  drawback  of  our  model  is  that  it  is  dedicated  to  a  fixed  repertoire  of  textures, 
viewed  at  a  particular  orientation  and  at  a  particular  magnification,  or  range.  The  problem 
is  easier  if  the  goal  is  merely  segmentation,  without  recognition.  We  are  experimenting  with 
segmentation  algorithms  that  are  basically  scale  and  orientation  independent.  Indeed,  there 
are  no  texture-specific  parameters.  These  are  built  upon  the  same  modelling/computing 
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framework. 


Figure  5.  Grey  Level  Histograms. 


PARAMETER.  ESTIMATION.  The  performance  of  the  model  is  not  unduly  sensitive 
to  the  choice  of  6  (see  (3.1))  or  0  (see  (3.3)),  which  were  determined  by  trial  and  error.  On 
the  other  hand,  the  pair-clique  parameters  ^°,i  =  l,2...,6,f  =  1,2,. ...A#  characterize  the 
A/  textures,  and  critically  determine  the  ability  of  the  model  to  segment  and  label.  Needless 
to  say,  these  must  be  systematically  estimated.  Trial  and  error  is  not  feasible. 

We  have  estimated  the  parameters  from  samples  of  the  M  textures.  These  training 
samples  contain  only  one  texture  each,  and  we  used  just  one  sample  for  each  texture.  Fbr  a 
fixed  texture,  say  wood,  and  from  a  single  sample,  say  **,  the  problem  then  is  to  estimate 
0i,0j in  the  model 

n(x  -z  ,#)- - — - 
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where 


v{zp,0)  =  -Y,  £  **(*f-*n 

■si  <*.<>• 

and 

,(l)  =  ^«xp{-U(x'’;»)} 

rf 

(We  include  d  =  (9i,...,0«)  in  II, t/,  and  Z  to  emphasize  the  dependencies  on  the  unknown 
parameters.)  The  standard  approach  is  to  maximize  the  likelihood: 

choose  9  to  maximize  W(zp\9) 

Of  course,  maximizing  II  is  equivalent  to  maximizing  log  II.  It  is  easily  demonstrated  that 
the  latter  is  concave  in  9  with  gradient 

(3.8)  Vlog  n(i#’;d)  =  (  £  ♦(*/  -  ztp)  -  E,  £  *(*f  -  Xp) 

where  £#[•]  is  expectation  with  respect  to  II(-;  9).  This  suggests  a  gradient  ascent  procedure, 
but  the  expectation  £#[•]  is  intractable,  involving  summation  over  the  entire  range  of  Xp. 
In  our  experiments,  we  used  a  16  grey-level  scale  for  the  pixels,  and  204  x  204  lattices;  then 
the  expectation  in  (3.8)  has  16<304*,04>  terms.  An  alternative  to  brute  force  evaluation  is 
to  use  stochastic  relaxation  (see  §2),  which  produces  an  (asymptotically)  ergodic  sequence 
XP(\),XP(2), ...  for  any  given  9,  and  from  which  expectations  can  be  approximated  by 
appropriate  time-averages.  This,  too,  is  computationally  intensive,  but  feasible.  In  some 
setting?  we  have  found  no  alternative,  and  this  Monte  Carlo  procedure  has  worked  well, 
albeit  slowly  (see  [43]).  See  also  Hinton  and  Sejnowski  [39]  for  a  closely  related  algorithm, 
used  to  model  learning  in  a  theory  of  neuron  dynamics. 

For  homogeneous  random  fields,  such  as  our  image  models,  Besag  ([2], [3])  has  pro¬ 
posed  an  ingenious  alternative  to  maximum  likelihood,  known  as  maximum  pseudolikelihood . 
The  pseudolikelihood  function  is 

PL(xp-,9)=  J]  n(Xf  =  */| X?  =  TrP ,r  *  s,9) 

where  6SP  is  the  boundary  of  Sp  under  the  neighborhood  system  determined  by  the  energy 
U ,  and  Sp\dSp  is  the  complement  of  dSp  relative  to  Sp .  The  pseudolikelihood  estimator 
is  the  9  that  maximizes  PL(xp,9).  Recently,  we  have  lent  some  analytic  support  by  estab 
lishing  consistency  of  pseudolikelihood  in  the  large-graph  limit  (see  Geman  and  Graffignc 
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[23]).  The  result  is  independent  of  critical  phenomena,  or  lack  of  spatial  stationarity,  both 
of  which  can  occur  with  infinite  volume  Gibbs  states  having  translation-invariant  potentials 
[54]. 

We  emphasize  the  overwhelming  computational  advantage  of  pseudolikelihood.  As 
with  the  log  likelihood  function,  the  log  pseudolikelihood  function,  log  PL(xp\ 6),  is  concave, 
but  this  time  the  gradient  is  directly  computable: 

Vlog  PL(xp;0)  =  V  £  {#(*/  -  xf+Ti)  +  *(xf  -  xp_T>)\ 

iesF\dS’’  v  «=i 

-  log  X)exp  |  {#(a  -  xf+T.)  +  *(a  -  xf_T>)} 

a  l«=l 

(where  ]£<»  >s  summation  over  pixel  grey  levels,  zero  through  fifteen  in  our  experiments) 

£  {  -*f+T.)  +  $(*f 

i*SF\dSF 

-  Jtf+T.)  +  *(Xf  -  X'UjX'’  =  if,'  Ji  .]  }1 

J  t=l,...,6 

This  time,  the  expectation  is  tractable.  The  conditional  distribution  on  Xp,  given  Xp  = 
xp,r  £  s,  involves  only  those  variables  xp  in  the  neighborhood  of  s.  Furthermore,  this 
time  the  summation  for  the  expected  value  is  over  the  range  of  Xp  only,  which  has  only 
sixteen  values.  In  short,  the  gradient  of  the  log  pseudolikelihood  is  directly  computable, 
and  therefore  gradient  ascent  is  feasible  without  resorting  to  time-consuming  Monte  Carlo 
methods.  For  the  texture  experiments  discussed  above,  the  pair-clique  parameters  were 
estimated,  for  each  texture  type,  by  gradient  ascent  of  the  pseudolikelihood  function. 

Some  modifications  of  maximum  likelihood  and  pseudolikelihood  have  been  recently 
introduced  by  Chalmond  [11].  A  third  alternative  was  suggested  by  Derin  and  Elliott 
((17), [18]),  and  has  been  studied  and  analyzed  extensively  by  Po6solo  (53).  This  involves  a 
regression  fit  of  the  log  of  the  local  conditional  probabilities,  and  works  best  when  there  are 
a  small  number  of  values  in  the  range  of  the  random  variables.  For  example,  the  method  is 
very  effective  for  Ising-like  models. 

DIRECTIONS  FOR  FUTURE  RESEARCH.  We  are  continuing  to  study  the  follow 
ing  related  issues: 

( 1)  Performance  of  Estimators.  Rates  of  convergence,  and  error  bounds,  for  parameters 
estimated  by  pseudolikelihood  merit  further  study.  We  hope  to  deepen  our  understanding 


of  how  pseudolikelihood  estimators  compare  to  maximum  likelihood  estimators  and  to  work 
in  the  direction  of  an  elementary  distribution  theory  for  the  pseudolikelihood  method. 


(2)  Better  Models  for  Texture  Synthesis.  We  shall  begin  with  the  recent  advances  by 
Cohen  and  Cooper  [14]. 

(3)  Model-independent  Segmentation.  We  have  begun  development  of  a  method  for 
segmenting  textured  images  without  classifying  the  types  of  textures  present.  This  approach 
requires  no  training.  Preliminary  versions  of  the  model  have  performed  well,  but  our  analytic 
understanding  is  limited.  The  prior  is  again  a  Markov  random  field,  but  on  a  random  graph 
structure.  The  random  graph  is  employed  to  simplify  computation  of  the  MAP  estimator. 
Indeed,  experiments  suggest  that  the  posterior  distribution  has  no  local  minima,  and  hence 
lends  itself  to  simple  iterative  improvement.  We  will  try  to  settle  the  conjecture  that,  in  a 
large  graph  ,:mit,  there  are  no  local  minima  in  the  energies  of  this  class  of  Markov  random 
fields  with  random  graphs. 

The  classical  Ising  model  provides  a  simple  example.  The  usual,  nearest-neighbor, 
energy  is 

#(*)  = 

M 

where  xt  €  {-1, l},s  G  Sn  =  N  x  N  square  lattice,  and  £[,,*)  is  summation  over  all  pairs 
of  horizontal  and  vertical  nearest  neighbors.  Of  course  the  two  configurations  that  achieve 
the  minimum  of  U  are  obvious  (all  -fl’s  and  all  -l’s),  but  nevertheless  difficult  to  find  by 
the  usual  Metropolis-like  relaxation  methods.  This  is  because  there  are  many  flat  ridges 
and  local  minima  in  the  energy  landscape.  Consider,  instead,  a  random  graph  Sn ,  with 
fixed  degree  (say  4),  and  let  this  define  the  neighborhood  system  in  the  expression  for  U. 
Our  simulations  suggest  that  for  N  sufficiently  large  U  has  no  ridges  or  local  minima;  it  can 
be  minimized  by  iterating  through  Sn,  sequentially  changing  the  states  x,  to  create  the 
lowest  contribution  to  U.  In  particular,  when  N  is  sufficiently  large  there  will  be  no  “ties”: 
every  non-minima]  configuration  has  at  least  one  site  s  such  that  changing  the  sign  of  x, 
strictly  lowers  the  energy.  Random  graphs  have  been  key  in  making  our  model-independent 
segmentation  algorithm  computationally  feasible. 
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4.  BOUNDARY  DETECTION 


Here  we  study  boundary  placement  (without  segmentation  per  se)  including  the  problems 
of  locating  sudden  changes  in  depth  (occluding  boundaries)  or  shape  (surface  creases,  etc.). 
The  idea  is  to  define  contours  which  are  faithful  to  the  3-D  scene  but  to  avoid  the  non¬ 
physical  edges  due  to  noise,  digitization,  texture,  lighting,  etc.  Obviously,  there  are  global 
discontinuities,  such  as  shadows,  which  are  essentially  impossible  to  distinguish  from  the 
occluding  and  shape  boundaries,  at  least  without  higher  level  information  or  information 
from  multiple  sensors,  in  which  case  boundary  classification  becomes  possible. 

The  complications  are  well-known:  digital  edges  tend  to  be  very  noisy,  due  in  part 
to  the  digitization  process  itself,  but  also  to  de-focusing  and  random  effects  in  detecting 
the  photons.  The  result  is  a  variety  of  pathologies:  “true”  boundaries  suddenly  disap¬ 
pear,  spurious  ones  appear  haphazardly,  and  in  general  the  surface  transitions  are  highly 
redundant. 

We  formulate  boundary  detection  as  a  single  optimization  problem,  fusing  the  detec¬ 
tion  6.  edges,  as  well  as  pruning,  linking,  smoothing,  etc.;  there  is  no  surface  reconstruction. 
The  subject  of  edge  detection  is  wildly  active,  and  there  has  been  considerable  progress  of 
late  in  designing  filters  based  on  differential  operators  for  optimally  detecting  various  ideal 
step,  crease,  and  other  edges  in  noise-corrupted  1-D  and  2-D  signals  ([9], (44), (56)).  Other 
methods  detect  edges  after  fitting  smooth  surfaces  to  the  data  ([38]  436]  ,[50]).  Still  others 
([5] 449] ,[56], [45])  perform  surface  reconstruction  and  boundary  detection  at  the  same  time, 
and  are  cast  in  a  framework  similar  to  the  set-up  in  [22]. 

The  use  of  boundary  maps  as  the  input  to  further  processing  is  ubiquitous  in  computer 
vision;  for  example,  algorithms  for  stereopsis,  optical  flow,  and  simple  object  recognition 
are  often  based  on  matching  boundary  segments.  Other  applications  include  the  analysis 
of  medical  images  (e.g.  angiograms  and  ultrasound);  automated  navigation  [6];  and  the 
detection  of  the  paths  of  roads  and  geologic  faults,  or  the  edges  of  lakes,  flood  plains,  and 
crop  fields,  in  remotely-sensed  images. 

IMAGE  MODEL.  We  will  give  an  outline.  The  details  can  be  found  in  [21]. 

Denote  the  pixel  lattice  {(«,))  :  1  <  i,j  <  S)  by  Sp  and  let  SB  denote  another  regular 
lattice  interspersed  among  the  pixels  (see  Figure  6(a))  and  of  dimension  ( X  -  1 )  x  ( N  -  1 ); 
these  are  the  boundary  sites. 

We  will  mimic  the  pixel/label  formulation  used  in  §3:  .V  =  { A' A'0},  a  pixel  process 
and  a  boundary  process,  with  Xp  =  {Xp)t€s'  and  The  observation 
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Figure  6. 


6(b) 


process,  or  data,  consists  of  grey-levels  Y,,a  G  Sp,  possibly  distorted  due  to  the  imaging 
and  recording  system.  Again  we  take  X  -*  Y  to  be  a  projection,  so  that  Y  =  Xp, 
assuming  undegraded  observations  of  the  pixel  intensities.  (See  [22]  for  boundary  finding 
with  restoration.)  Given  y  =  x®,  we  wish  to  assign  values  to  the  boundary  variables 
xB  =  {xf  ,3  G  5®},  where  xf  =  1  (resp.  0)  indicates  the  presence  (resp.  absence)  of  a 
boundary  at  site  a  €  5s.  We  have  already  discussed  the  corresponding  interpretation  of 
boundary  map  xB  —  xB{y)  in  terms  of  physical  discontinuities  in  the  underlying  three- 
dimensional  scene. 


We  establish  a  boundary  resolution  or  grid  size  a  >  1.  (This  is  distinct  from  the 
resolution  of  the  pixels,  although  the  two  are  naturally  related.)  Let  SB  ^  C  SB  denote  the 
sub-lattice  {(ur  +  l,j<r  + 1) :  0  <  i,j  <  («-  2)/<r).  Only  the  variables  xf ,  s  £  SBay  interact 
directly  with  the  data;  the  remaining  variables  xf ,  a  G  SB  \  Sfa),  are  determined  by  those 
on  the  grid  j.  Figure  7  shows  the  grids  and  S®)  for  N=8;  the  sites  off  the  grid  are 
denoted  by  dots.  Let  fl®  and  0®  denote  the  state  spaces  of  intensity  arrays  and  boundary 
maps  respectively,  then 

=  {{*f }  :  *  6  Sp ,0  <  xf  <  255),  and  0®,  =  {{xf)  :  s  €  5(®},xf  G  {0,1}}. 


Figure  7. 
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The  selection  of  a  influences  the  interpretation  of  x®,  the  computational  load,  the 
interaction  range  at  the  pixel  level,  and  is  related  to  the  role  played  by  the  size  of  the  spatial 
filter  in  edge  detection  methods  based  on  differential  operators. 
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Let  <  a,  t  >„,  a,t  6  S^,  denote  a  nearest-neighbor  pair  relative  to  the  grid.  Thus, 
a  =  (»(7  +  1,  jo  +  1),  t  =  ( k<7  +  1,  lo  +  1)  is  such  a  (horizontal  or  vertical)  pair  if  either  *  =  k 
and  j  =  / ±  1,  or  j  =  l  and  i  =  k±  1.  We  identify  <  s,t  >„  with  the  elementary  boundary 
segment  consisting  of  the  horizontal  or  vertical  string  of  a  +  1  sites  (in  SB)  including  s,  t 
and  the  a  —  1  sites  between  s  and  t. 

A  disparity  measure  AJt  =  A^(zp)  will  gauge  the  intensity  flux  across  <  s,t  >a,  i.e. 
orthogonal  to  the  associated  segment.  We  will  experiment  with  several  types  of  measures. 
An  obvious  choice  at  high  resolution  (a  =  1)  is  A ,t  =  \xp.  -  xp  |  (or  a  suitable  linear 
transformation  of  this)  where  s*,t*  £  Sp  are  the  two  pixels  associated  with  <  s,t  >„;  see 
Figure  6(b).  The  analogous  choice  at  a  lower  resolution  (a  >  1)  might  be  a  measure  of  the 
form  A  S|  =  m-1|  xt  ~  ICd,  xt*l’  where  Di,Dj  C  Sp  are  adjacent  blocks  of  pixels,  of 
the  same  size  (m)  and  shape,  and  separated  by  <  s,t  >„.  These  and  other  measures  are 
discussed  later  on. 

The  energy  function  U(xp,xB)  should  promote  boundary  maps  xB  which  are  faithful 
to  the  data  y  =  xp  in  the  sense  that  large  values  of  A^x^)  are  associated  with  “on” 
segments  ( xfxf  =  1)  and  small  values  with  “off"  segments  (xfxf  =  0).  There  are  no 
a  priori  constraints  on  xB  at  this  point;  in  fact,  for  most  actual  scenes  xp ,  the  energy 
U  will  actually  favor  maps  xB  with  undesirable  dead-ends,  multiple  representations,  high 
curvature,  etc.  A  simple  choice  for  the  xp/xB  interaction  is 

U(xp,xB)=  (l-*f*f)A.t(x/>) 

<*.«>. 

where  the  summation  extends  over  all  nearest-neighbor  pairs  <  s,t  >„. 

Returning  to  the  disparity  measure,  we  employ  one  type  for  depth  and  shape  bound¬ 
aries  and  another  for  the  texture  experiments.  In  the  former  case,  the  disparity  measure 
involves  the  (raw)  grey-levels  only,  whereas  for  texture  discrimination  we  also  consider  data 
transforms  based  on  so-called  directional  residuals.  Except  when  o  -  1,  the  data  sets  are 
compared  by  the  Kolmogorov-Smimov  distance  [7],  as  explained  below. 

At  the  highest  resolution  (<r  =  1),  the  simple  measure  |x£  -  x£|  suitably  normalized 
(where  s*,t"  are  the  two  pixels  associated  with  the  boundary  sites  s,t\  see  Figure  6(b))  can 
be  effective  for  simple  scenes. 

At  lower  resolution,  let  D\,Dj  denote  two  adjacent  blocks  of  pixels,  of  equal  size 
and  shape.  An  example  is  illustrated  in  Figure  8  for  the  case  of  two  square  blocks  of  size 
5*  =  25  which  straddle  a  vertical  segment  <  s,  t  >a  with  o  =  3.  Let  xp(D})  =  {xp ,s  £  Dj), 


j  =  1,2,  be  the  grey-levels  in  the  blocks  and  set 


A1>t(xp)  =  d(xp(Di)fxp(D2)) 

where  d(-,  •)  is  a  monotonic  function  of  the  Kolmogorov-Smirnov  distance  between  two 
empirical  distributions.  This,  for  example,  is  the  type  of  disparity  measure  we  use  for  the 
house  scene  and  the  radar  scene  (shown  below). 
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Figure  8. 


Various  enhancements  in  A  are  necessary  to  detect  the  more  subtle  transitions  some¬ 
times  found  between  two  textured  regions.  These  include  the  use  of  multiple,  high-order, 
spatial  features  in  the  definition  of  d.  In  the  experiments  below,  several  such  features  were 
used  in  segmenting  the  Brodatz  texture  collage. 

The  other  component  in  the  model  is  a  penalty  function  V(xB)  which  counts  the 
number  of  “taboo”  patterns  in  xB\  states  xB  for  which  V(xB)  >  0  are  “forbidden.”  For 
example,  boundary  maps  are  penalized  for  dead-ends,  clutter,  density,  etc.  V(xB)  denotes 
the  total  number  of  “penalties”  associated  with  xB  €  fi®  j .  These  penalties  are  simply  local 
binary  patterns  over  subsets  of  SBy  Figure  9  illustrates  a  family  of  four  such  patterns; 
they  can  be  associated  with  any  resolution  a  by  the  obvious  scaling. 

o  i  ;  ii 

0  10  10  1  1  11 

Figure  9. 


The  patterns  in  Figure  9  correspond,  respectively,  to  an  isolated  or  abandoned  seg¬ 
ment,  sharp  turn,  quadruple  junction,  and  too-small  structure.  Depending  on  <r,  the  pixel 
resolution,  and  scene  information,  we  may  or  may  not  wish  to  include  the  latter  three.  For 


example,  including  the  last  one  with  a  —  6  would  prohibit  detection  of  a  square  structure 
of  pixel  size  6x6. 


Finally,  our  image  model  is 

H(*p,zB)  = 

the  usual  Gibbs  distribution,  but  confined  to  the  allowed  boundary  states. 

DEGRADATION  MODEL.  The  degradation  is  again  deterministic;  we  observe  the 
projected  data  y  =  xp  *-  (xp,xB). 

POSTERIOR  DISTRIBUTION.  As  in  §3,  this  is  just 

n((zp,*B)|»)  =  —  exp{-ir(zp,*fl)}lV(*«)=ol*'>=v 

zf 

ESTIMATING  THE  IMAGE.  Our  experiments  in  boundary  finding  have  been  with 
(approximate)  MAP  estimators. 

COMPUTING.  This  again  is  by  stochastic  relaxation.  An  extension  of  the  theory  outlined 
in  $2  is  needed  to  handle  the  constrained  optimization  problem 


minimizex«:v(s*)so  (1  -  *? zf  )A,tt(xp). 

By  introducing  the  constraints  slowly  during  the  annealing  sequence,  a  global  minimum, 
within  the  constrained  subset,  can  again  be  found.  The  theory  behind  this  enhancement  of 
stochastic  relaxation  is  layed  out  in  [21]. 

EXPERIMENTAL  RESULTS.  In  each  of  Figures  10,  11,  and  12  the  original  scene 
is  in  the  top  panel.  A  version  of  stochastic  relaxation  is  used  to  incorporate  constraints 
while  minimizing  V .  The  placement  of  boundaries  is  shown  in  the  bottom  panel.  Figure 
10  is  a  synthetic  aperture  radar  picture  of  water  and  ice  (water  is  the  region  generally 
appealing  darker),  and  Figure  11  is  a  collage  of  four  Brodatz  textures  [8],  chosen  for  the 
rather  subtle  transitions  created  by  their  juxtapositions.  The  bottom  panel  of  Figure  11 
shows  the  evolution  of  the  boundary  process  during  relaxation. 

FIGURES  10,11,12,  ARE  PLACED  AT  THE  END  OF  THE  TEXT. 

DIRECTIONS  FOR  FUTURE  RESEARCH.  These  models  may  be  extended  in  many 
directions.  What  follows  is  a  brief  description  of  two  such  generalizaitons. 


22 


(1)  Other  Boundary  Primitives .  The  elementary  boundary  unit  or  primitive  is  a  hori¬ 
zontal  or  vertical  segment  whose  length  is  resolution-dependent,  namely  o  + 1  in  pixel  units. 
As  a  result,  a  discontinuity  running  at  45°  is  detected  and  localized  with  less  reliability  than 
one  at  0°  or  90°,  where  the  disparity  measure  has  maximum  sensitivity.  An  obvious  remedy 
is  to  replace  the  pairs  <  s,t  >0  by  other  primitive  segments,  e.g.  a  distinguished  family  of 
triples,  such  as  the  six  represented  (up  to  translates)  in  Figure  13. 
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Figure  13. 

More  generally,  for  any  such  family  A  consider  the  interaction  term 


^(i-nseAxf)AA(xp) 

A€A 

where  Aa(xp)  is  a  measure  of  the  disparity  in  the  A-direction.  In  Figure  13,  these  directions 
are,  respectively,  x/4,  x/4,  —x/4,  —  x/4,  x/2,  and  0.  One  can  imagine  a  variety  of  ways 
to  situate  two  appropriately  shaped  sets  of  pixels  to  straddle  (and  abut)  segments  such  as 
these. 

(2)  Multivariate  Data.  Data  may  be  available  from  several  sensors,  e.g.  multispectral 
satellite  data,  or  an  optical  camera  and  laser  radar.  One  might  integrate  these  with  a 
composite  disparity  measure  of  the  type  used  in  the  texture  studies,  viz.  A  =  max  A^fc^  in 
order  to  improve  discrimination. 

Another  possibility  is  to  attempt  to  classify  boundaries,  especially  if  range  data  is 
also  available.  Let  xP(I)  and  xp<2>  be  range  and  brightness  values,  and  let  xf  assume 
three  values,  say  0,1,2,  corresponding  to  “off”,  “occluding  (=depth)  boundary”  and  “other 
boundary”  (for  instance  a  crease  or  shadow).  Now  design  the  energy  function  to  couple 
A(xP(1>)  with  £  =  {&,},  =  tf{i}(z? )  and  A(xp<2>)  with  77  =  {77,},  77,  =  ^i>2}(xf );  for 

example,  just  add  the  two  corresponding  terms  to  form  U.  The  penalty  patterns  are  the 
usual  ones  regarding  dead-ends,  etc.,  as  well  as  mixtures  of  0,1,2  corresponding  to  physically 
implausible  (or  impossible)  transitions. 


5.  SINGLE  PHOTON  EMISSION  TOMOGRAPHY 


Emission  tomography  is  used  to  determine  the  distribution  of  a  pharmaceutical  in  a  part 
of  the  body  such  as  the  brain,  liver,  or  heart.  Depending  upon  the  pharmaceutical  used, 
this  concentration  can  be  taken  as  a  measure  of  local  blood  flow  (perfusion)  and/or  local 
metabolic  activity.  Glucose,  for  example,  is  taken  up  by  neuronal  cells  in  proportion  to 
metabolic  activity,  and  the  latter  generally  mirrors  recent  electrical  activity.  Thus,  areas  of 
the  brain  most  used  in  performing  a  cognitive  or  motor  task  will  demonstrate  a  relatively 
increased  uptake  of  glucose  immediately  following  the  task.  For  the  heart,  pharmaceuticals 
can  be  chosen  whose  uptake  reflects  local  perfusion.  The  concentration  of  these  pharma¬ 
ceuticals  can  thereby  be  used  to  assess  the  adequacy  of  blood  flow  to  the  different  parts  of 
the  heart. 

In  single  photon  emission  computed  tomography  (SPECT),  pharmaceutical  concen¬ 
tration  is  estimated  by  detecting  photon  emissions  from  an  injected  or  inhaled  dose  of 
the  pharmaceutical  that  has  been  chemically  combined  with  a  radioactive  isotope.  This 
combined  agent  is  called  a  radiopharmaceutical.  The  goal  of  SPECT  is  to  determine  radio¬ 
pharmaceutical  concentration  (equivalently,  isotope  concentration  or  density)  as  a  function 
of  position  in  a  region  of  the  body.  Detectors  with  collimators  are  strategically  placed 
around  the  region  of  interest,  and  these  are  able  to  count  photons  emitted  by  radioactive 
decay  of  the  isotope.  A  detector  will  capture  those  photons  which  escape  attenuation  and 
whose  trajectories  carry  them  down  the  bore  of  the  collimator. 

The  determination  from  photon  counts  of  isotope  concentration  as  a  function  of  po¬ 
sition  is  referred  to  as  reconstruction.  Interest  in  statistical  approaches  to  reconstruction 
problems  in  emission  computed  tomography  was  greatly  enhanced  by  the  work  of  Shepp  and 
Vardi  [55]  on  the  use  of  maximum  likelihood  (ML)  methods.  There  are  earlier  instances 
of  suggestions  to  regard  the  reconstruction  problem  as  a  statistical  estimation  problem; 
however,  the  demonstration  of  the  versatility  of  the  approach  as  well  as  the  specification  of 
algorithms  that  work  were  advanced  substantially  by  Shepp  and  Vardi’s  work. 

The  image  reconstruction  problem,  viewed  as  an  estimation  problem,  is  inherently 
nonparametric:  one  seeks  an  estimate  of  a  function  of  general  form  on  a  continuous  domain. 
As  such,  it  is  widely  recognized  that  the  estimates  need  to  be  regularized  or  smoothed,  espe¬ 
cially  in  “small  sample”  implementations.  Various  approaches  to  regularization  have  been 
suggested,  including  penalized  ML,  the  method  of  sieves,  and  Bayesian  methods.  In  Geman 
and  McClure  [24],  we  proposed  that  a  priori  spatial  information  be  built  into  a  statistical 
reconstruction  algorithm,  in  a  Bayesian  approach,  by  quantifying  spatial  constraints  in  the 


form  of  a  Gibbs  prior  distribution.  In  Geman  and  McClure  [25]  we  demonstrated  the  feasi¬ 
bility  of  estimating  parameters  of  the  Gibbs  prior  from  single  patient  data.  A  brief  outline 
of  the  Bayesian  approach,  with  parameter  estimation,  follows. 

IMAGE  MODEL.  Let  X,  denote  the  concentration  of  the  radiopharmaceutical  at  the 
point  s  =  ( x,y )  in  the  domain  ft  of  interest.  We  shall  take  ft  to  be  a  bounded  two- 
dimensional  region,  though  for  the  models  and  methods  we  will  describe  there  are  no  es¬ 
sential  changes  when  ft  is  three-dimensional. 

We  shall  construct  a  prior  distribution  on  X  that  captures  simple  prior  expectations 
about  the  qualitative  nature  of  the  isotope  density.  Mainly,  we  wish  to  exploit  the  antic¬ 
ipated  smoothness  of  X.  Neighboring  locations  will  typically  have  similar  intensity  levels. 
But  we  must  also  accommodate  sharp  changes  in  concentration,  which  might  occur  across 
an  arterial  wall  or  across  a  boundary  between  two  tissue  types. 

In  the  spirit  of  nonparametric  estimation,  we  might  construct  the  prior  on  a  suitable 
space  of  functions  X  :  ft  -»  R.  It  is  more  convenient,  however,  to  do  the  construction 
on  the  discrete  lattice  domain  5  introduced  in  §2.  The  prior,  therefore,  is  on  the  array 
X  =  {X,}j€s-  As  usual,  we  adopt  the  Gibbs  representation  (2.1). 

The  expected  configurations  are  those  for  which  typical  neighboring  sites  s,  t  6  S  have 
similar  intensities  X,,Xt.  This  is  a  local  constraint  and  it  is  conveniently  captured  by  a 
locally  composed  energy  function  U , 

(5.1)  U(X)  =  -*«)+£  -4 <KX>  -  Xt). 

[*,t]  <»,*>  V 1 

Here  we  use  [s,  t]  to  indicate  that  s  and  t  are  nearest  horizontal  or  vertical  neighbors  in  the 
lattice  5,  and  <  s,t  >  to  denote  diagonal  neighbors.  The  constant  /?  is  positive  and  the 
function  <j>(£)  is  even  and  minimized  at  £  =  0.  Thus  U  is  minimized  by  configurations  of 
constant  intensity.  Under  the  Gibbs  distribution  with  energy  (5.1)  the  more  likely  isotope 
densities  are  those  with  small  site-to-site  variation  in  intensity. 

To  achieve  the  desired  properties  for  the  more  likely  isotope  densities,  the  exact  form 
of  <f>  is  probably  not  important,  but  its  qualitative  features  can  make  a  difference.  We  have 
experimented  with  </>’s  that  are  increasing  in  £  for  £  >  0.  An  obvious  choice  is  <£(£)  =  £2, 
but  then  under  11(A),  large  intensity  gradients,  as  would  be  associated  with  certain  natural 
boundaries,  are  exceedingly  unlikely.  Instead,  we  use  functions  of  the  form 

m  =  1  +  (£/*)2 


(5.2) 


where  6,  like  /?,  is  a  constant  to  be  fixed  later. 

Levitan  and  Herman  [41]  have  recently  proposed  the  use  of  Gaussian  priors  in  a 
Bayesian  formulation.  Liang  and  Hart  [42]  also  suggest  the  use  of  Gaussian  priors,  as  well 
as  others,  deduced  by  max-ent  arguments  from  prior  constraints  on  low-order  moments  of 
X.  Our  earlier  experiments  with  the  quadratic  energy  function  indicated  that  the  resulting 
Bayesian  algorithms  oversmoothed  real  boundaries  where  the  difference  (X3  -  Xt)  should 
be  allowed  to  be  large.  The  finite  asymptotic  behavior  of  our  (^-function  was  designed  to 
mitigate  this  oversmoothing. 

DEGRADATION  MODEL.  We  assume  that  the  detectors  are  arranged  in  a  linear  array, 
at  equally  spaced  lateral  sampling  intervals,  and  that  the  detector  array  can  be  positioned 
at  any  orientation  0  relative  to  the  x-axis.  (See  Figure  14.)  We  assume  the  detectors  are  of 
so-called  parallel  bore  type,  meaning  that  they  detect  only  those  photons  in  a  small  interval 
[6  -  Ad/2, 6  +  A6/2)  when  the  array  has  orientation  6.  Let  L  denote  the  total  number  of 
detectors  in  the  array  and  let  A  a  denote  the  spacing  between  detectors. 


a 


Figure  14.  Detector  Geometry. 

The  physical  effects  incorporated  in  the  model  are  the  spatial  Poisson  process  that 
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describes  the  sites  of  the  radioactive  decays  from  which  photons  emanate  and  the  process 
of  photon  attenuation  by  which  photons  are  annihilated  and  their  energy  is  absorbed  by 
matter  through  which  their  trajectories  pass.  Attenuation  is  accurately  described  by  a 
linear  attenuation  function  p(s)  on  ft.  The  function  p  is  assumed  to  be  known;  values  of 
p  for  bone,  muscle,  etc.  and  for  various  photon  energies  are  known  a  priori  or  could  be 
measured  by  transmission  tomographic  methods.  Attenuation  is  a  memoryless  process  and 
we  can  thus  deduce  the  functional  form  of  the  probability  that  a  photon  survives  to  reach 
the  detector  array.  When  a  photon  trajectory  has  direction  0  and  it  emanates  from  site 
s  =  ( x,y )  in  ft,  then 

P(photon  survival)  =  exp {-  /  p({,q)dl}, 

Jc(x,y) 

where  the  line  integral  is  taken  over  the  segment  C(x,y)  from  (z,y)  to  the  detector  and  dl 
is  differential  arc  length. 

For  our  sampling  design,  we  shall  position  the  detector  array  at  n  equally  spaced 
angles  Ok  for  duration  T  time  units  at  each  angle.  Then  at  each  angle,  we  observe  the 
random  variables  Yt,  for  t  £  D*  =  {(<7j,  0*),.?  =  1  ,...,£}  that  give  the  numbers  of  photons 
reaching  the  respective  detectors  during  the  sampling  interval.  Assuming  that  (i)  photons 
are  generated  by  a  spatially  nonhomogeneous  Poisson  process  with  intensity  X,  per  time 
unit,  and  (ii)  the  orientations  0  of  photon  trajectories  are  uniformly  distributed  on  (0,2ir), 
we  can  show  that  Yt.  for  t  £  D  =  U£=1.D*,  is  itself  a  Poisson  process  with  a  nonhomogeneous 
intensity  function  described  in  terms  of  the  attenuated  Radon  transform  (ART)  of  X.  The 
ART  of  X  is  defined  as 

(R^tX)(o,0)=  j  TX(x, y)exp(-  /  pU,v)dl')dl 

JC  JC.(x,y) 

where  £  is  the  line  with  orientation  0,  through  point  o  of  the  detector  array,  C(x,y)  is  the 
segment  of  £  starting  at  point  ( x,y )  in  ft,  and  dl  and  dl'  are  differential  arc  length  in  the 
two  line  integrals.  The  intensity  function  of  Y  is  then  given  by 

EYt  =  /  /  {R^TX)(o,0)dod0> 

J$k-&8/1  Jaj-aa/1 

where  t  =  (oj,  Ok).  The  important  feature  of  this  representation  is  that  the  intensity  function 
of  Y  is  the  result  of  applying  a  positive  linear  integral  operator  At  to  A': 

(5.3)  EY  =  AtX. 

By  further  exploiting  properties  of  the  Poisson  process,  it  can  be  shown  that  the 
observables  Yt  are  mutually  independent  and  Poisson  distributed;  the  likelihood  function 
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is  then  easily  obtained  from  (5.3).  Recall  oar  discretisation  of  the  domain  ft  into  pixels 
parameterized  by  discrete  points  s  in  a  square  lattice  S.  Now  {A,},€s  represents  a  piecewise 
constant  approximation  of  the  isotope  concentration  on  the  continuous  domain.  When  ft  is 
discretized,  then  equation  (5.3)  takes  the  form 

EY  =  AtX, 

where  At  is  a  matrix,  Aj  =  {A(t,s)}tGD^es\  commonly,  the  order  of  At  is  extremely  large 
and  it  may  not  have  full  column  rank.  Now  for  a  given  X,  the  Poisson  probability  function 
of  Y  is 

(5.4) .  H(Y\X)  =  I]  ((Ary,(-)]y-  exp{-MT*)(t)} 

t*D  Tt‘ 

It  is  interesting  to  examine  reconstruction  by  the  method  of  maximum  likelihood, 
which  calls  for  choosing  X  to  maximize  Il(y|X),  given  observed  counts  Y.  Shepp  and  Vardi 
[55]  have  developed  effective  algorithms  based  on  EM  (Dempster,  Laird  and  Rubin  [16])  for 
implementing  ML  reconstructions  in  positron  emission  tomography  (PET).  A  penetrating 
description,  written  from  a  statistician’s  perspective,  is  given  in  Vardi,  Shepp  and  Kaufman 
[57].  (In  PET,  photon  attenuation  does  not  enter  the  model  relating  isotope  concentra¬ 
tion  to  the  observables.)  McClure  and  Accomando  [1]  have  developed  the  foundations  for 
applying  ML  to  SPECT  reconstructions  and  have  implemented  EM  algorithms  on  a  vari¬ 
ety  of  computer  systems.  Independently,  Miller,  Snyder  and  Miller  [48]  have  made  similar 
extensions  of  ML  and  EM  for  SPECT. 

The  log-likelihood  function  is 

lnL(X)  =  53{-ln(yl!)+r,ln[(.4TX)(0]-(ATX)(0}.  (3.2) 

The  necessary  conditions  for  maximizing  In  L(X)  obtained  by  setting  derivatives  to  zero  do 
not  yield  explicit  solutions  for  a  maximizing  X.  Nonetheless,  -  InL(A)  is  globally  convex, 
and  the  ML  optimization  problem  conveniently  adapts  to  the  EM  method.  In  general, 
-  In  L(X)  is  not  strictly  convex;  this  is  an  identifiability  issue  related  to  the  column  rank 
of  At-  Conditions  for  strict  convexity  are  discussed  by  Accomando  [1], 

POSTERIOR  DISTRIBUTION.  From  (2.1),(5.1)^5.2),  and  (5.4)  the  posterior  distri¬ 
bution  on  X  is 

(5.5)  JI(*|y)=  -^-exp{-f/(A')+  y'[yin[(Ar.V)(0]-(M/.V)(0]} 

/j(Y)  TtD 
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where  Z(Y)  is  a  normalizing  constant  that  depends  on  Y. 

ESTIMATING  THE  IMAGE.  Onr  experiments  indicate  that  the  minimum  mean  squar¬ 
ed  error  (MMSE)  estimator  (X*  =  E(X\Y))  is  more  appropriate  for  the  reconstruction 
problem  than  the  MAP  used  for  texture  classification  ($3)  and  boundary  detection  (§4). 

COMPUTING.  We  initialize  X  =  A(0).  In  practice,  we  choose  a  “good”  initialization, 
such  as  the  maximum  likelihood  reconstruction  via  EM,  but  easy  theory  says  that  conver¬ 
gence  is  independent  of  the  initialization.  Stochastic  relaxation  produces  an  ergodic  Markov 
chain  with  equilibrium  (5.5).  The  ergodicity  of  the  chain  guarantees  that  an  ergodic  average 
of  {A(r)}“  0  will  converge  to  X *  a.s.  In  practice,  we  compute  N  iterates  and  average  the 
final  AS,  with  choices  such  as  N  =  25  and  AS  =  5.  The  selection  of  suitable  AS  and  N  can 
be  guided  by  monitoring  stabilization  of  statistics  of  the  successive  iterates  X(T*. 

PARAMETER  ESTIMATION.  The  choice  of  0  is  critical.  With  0  =  0  the  estimator 
is  undersmoothed,  and  in  fact  MAP  estimation  is  just  ML,  since  the  prior  is  uniform.  If  0 
is  too  large,  the  estimator  is  too  faithful  to  the  prior  and  is  oversmoothed.  The  parameter 
6  is  also  important,  though  we  have  found  that  (i)  its  value  can  usually  be  set  based  on 
information  about  the  range  of  values  {A,},  and  (ii)  reconstructions  are  not  sensitive  to 
moderate  changes  in  S.  The  discussion  here  will  focus  on  0. 

Because  of  the  setting  in  which  reconstruction  algorithms  are  actually  used,  it  is 
desirable  to  design  estimation  methods  that  work  with  a  sample  Y  of  size  one  from  the 
observable  process.  The  isotope  density  X  is  assumed  to  be  drawn  from  a  Gibbs  prior 
with  unknown  0,  but  known  6  (5.2).  We  shall  estimate  0  from  Y  and  use  the  estimate  0 
in  the  MMSE  or  MAP  reconstruction  program.  It  is  reasonable  to  do  this  with  a  single 
observation  Y ,  since  Y  contains  a  large  amount  of  data  about  X,  which,  in  turn,  contains 
a  large  amount  of  data  about  the  local  energy  function  U(X). 

To  be  more  explicit  about  the  dependency  on  0  of  the  prior  and  posterior  distributions, 
we  introduce  the  function 

V(X)  =  -  Xt)  +  ~  £  -  Xt). 

I».‘l  * 1  <*.«> 

V  is  just  U 10.  The  prior  is  now  written 

n(X)=±exp{-0V(X)} 

z0 

and  the  posterior,  given  Y ,  is 

II(A|V)=  — \^exp{-0V(X)  +  r[YM{ATXm-(ATX)(t)]} 

Zp\Y)  f€D 


Now  V{X)  is  a  complete-data  sufficient  statistic  for  0.  If  we  were  ftble  to  observe  X 
directly,  then  we  could,  in  principle,  solve  the  likelihood  equation 

(5.6)  Efi[V(X)]  =  V(X) 

for  the  ML  estimate  of  0.  The  left-hand  side  of  (5.6)  is  strictly  decreasing  in  0  and  thus 

(5.6)  yields  a  unique  root  0. 

Our  situation  is  more  complicated  than  this  since  we  do  not  observe  X,  but  instead 
we  see  only  the  incomplete  data  Y .  We  have  a  classic  setup  for  application  of  EM.  The  EM 
algorithm,  when  it  converges,  will  yield  a  root  of  the  incomplete-data  likelihood  equation 

(5.7)  E0[V(X) ]  =  Ep(V(X)\Yy, 

see  Dempster,  Laird  and  Rubin  [16].  We  note  that  there  is  no  proof  of  uniqueness  of  roots 
of  (5.7).  Conceptually,  (5.7)  is  solved  at  the  intersection  of  two  monotone  decreasing  func¬ 
tions  of  0.  Whether  (5.7)  does  admit  multiple  solutions  is  an  open  and  elusive  theoretical 
question. 

To  solve  (5.7),  the  EM  algorithm  consists  of  two  alternating  steps — estimation  of  the 
right-hand  side  of  (5.7)  for  prescribed  0  (E-step)  and  computation  of  the  root  0  of  (5.7), 
substituting  the  current  estimate  of  Ep(V(X)\Y)  on  the  right-hand  side.  Specifically,  we 
fix  an  initial  0  =  j3°  and  an  initial  X  =  X°  (and  hence  V0).  Then  solve 

E-step.  Estimate  the  complete-data  sufficient  statistic: 

(5.8a)  V(r+1>  =  E^(V(X)\Y) 

M-step.  Determine  /?*T+1)  as  the  solution  of 
(5.85)  £70{V(AT)]  =  V<T+1>. 

The  first  step  is  done  using  stochastic  relaxation,  using  say  ten  steps  of  stochastic  relaxation 
and  averaging  the  last  five  values  of  V(X ^).  The  second  step  is  a  simple  root-finding 
calculation  once  the  curve  £^(V'(.Y)]  is  known.  Conveniently,  the  stochastic  relaxation 
procedure  simultaneously  yields  updates  A'(r)  of  the  MMSE  reconstruction.  Thus  (5.8a) 
and  (5.8b)  together  give  a  completely  data-driven  method  of  reconstruction. 
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The  construction  of  Eff[V(X)]  u  a  function  of  0  can  be  done  off-line,  once  and  for  all. 
We  have  done  this  using  stochastic  relaxation  to  simulate  230  configurations  X  from  the 
prior  (4.1)  for  /Lvalues  ranging  from  0  to  6.  Five  replications  were  done  at  each  of  forty-six 
values  of  0.  The  resulting  curve,  fit  by  a  cubic-spline  regression  function,  is  depicted  in 
Figure  15.  The  calculation  of  this  curve  required  forty-one  hours  of  CPU  time,  using  a 
highly  optimized  program  on  the  100  Megaflop  Star  Technologies  ST100  Array  Processor. 


Figure  15. 

J.  Mertus  [46]  has  developed  an  efficient  vectorized  FORTRAN  program  for  the  EM 
estimation/reconstruction  procedure  described  above.  Each  E-step,  with  ten  sweeps  of 
stochastic  relaxation,  takes  on  the  order  of  seven  minutes  of  CPU  time  on  an  1BM3090  or 
about  three  minutes  on  a  CYBER  205,  working  on  a  64  x  64  pixel  lattice  S\  for  isotope 
densities  X  having  their  support  on  a  disk  of  diameter  44  pixels  (about  22cm)  and  with  a 
range  of  64  grey  levels.  (These  values  correspond  to  our  real  data  sets.)  The  computational 
requirements  are  enormous,  but  not  prohibitive. 

EXPERIMENTAL  RESULTS.  We  report  on  two  experiments  which  have  boon  run 
on  real  and  simulated  data  to  learn  about  the  performance  of  the  Bayesian  reconstruction 


method*.  One  simulation  experiment  was  designed  to  test  the  versatility  and  robustness 
of  the  methods  to  known  departures  from  the  model.  The  other  experiment  illustrates  the 
performance  of  the  algorithms  on  real  data  from  a  lung  section. 

Experiment  1.  A  phantom  isotope  density  (Figure  16A)  was  designed  to  have  a  combi¬ 
nation  of  (i)  large-scale  structure,  including  subregions  of  ft  with  considerable  differences 
in  intensity,  and  (ii)  local  irregularity  of  the  same  qualitative  nature  as  that  of  sample 
functions  from  the  Gibbs  model  with  energy  (5.1)-(5.2),  yet  not  precisely  fitting  the  Gibbs 
model.  Two  functions  were  averaged  to  form  the  phantom.  First,  an  array  with  a  sharp 
spike  in  intensity  (near  the  center,  below  the  middle)  was  constructed.  Second,  an  array  was 
sampled  from  (2.1)  (using  (5.1  )-(5.2))  with  parameter  values  0  =  1,  and  i  =  12.  Intuitively, 
the  local  structure  of  the  average  will  be  governed  by  the  array  sampled  from  the  Gibbs 
model.  But  observe  that  the  rescaling  of  this  array  due  to  the  arithmetic  averaging  means 
that  it  will  not  exactly  fit  a  model  from  the  same  family.  Roughly  speaking,  the  averaging 
has  the  effect  of  smoothing  the  array  so  that  it  will  be  better  described  by  a  Gibbs  model 
with  larger  /9-value,  assuming  6  is  fixed  for  now.  We  thus  anticipate  estimated  values  of  0 
larger  than  the  value  0=1  used  to  generate  the  Gibbsian  part  of  the  averaged  phantom. 

To  simulate  the  emitted  photons,  the  constant  linear  attenuation  function  p  =  0.2 
was  chosen,  corresponding  to  approximately  ten  percent  attenuation  per  centimeter  for  our 
scaling  of  the  Teal  system.  A  total  of  663,144  photons  were  counted  at  64  angles  0,  with 
L  =  64  bins  on  the  linear  detector  array;  in  actuality,  only  44  of  the  bins  collect  positive 
counts  because  the  support  of  the  phantom  is  contained  in  a  smaller  disk  of  diameter  44 
pixels. 

Reconstructions  are  depicted  in  Panels  B-E  of  Figure  16.  All  were  constructed  on 
the  range  [0,63]  with  parameter  b  =  12.  The  MMSE  reconstruction,  with  3  estimated  by 
EM  (5.8)  is  shown  in  Panel  16B.  When  0  was  initialized  at  0°  =  0.0,  the  successive  EM 
iterates  from  (5.8b)  were  0.52,  0.72,  0.85,  1.01,  1.18,  1.29,  1.34,  1.38,  1  40,  1.41,  . ..,  1.47 
after  thirteen  steps  of  (5.8b).  The  MMSE  in  Panel  16B  was  run  at  0  =  1.47. 

The  maximum  likelihood  (via  EM)  reconstruction  is  shown  in  Panel  16C.  Panel  161) 
shows  an  MMSE  run  with  a  value  of  0  =  0  52,  which  is  too  small  (uudersmoothing). 
Panel  16E  shows  an  MMSE  run  with  a  value  of  0  =  4.40,  which  is  too  large  (oversmoothing). 

FIGURE  16  IS  PLACED  AT  THE  END  OF  THE  TEXT 

Experiment  2.  A  total  of  124,136  photons  were  counted  from  a  cross  section  of  a  patient  's 


tono,  including  the  lung*.  For  the  reconstructions,  we  set  the  linear  attenuation  function 
again  at  n  s  0.2.  The  reconstructions  were  done  on  the  range  [0,63]  with  fixed  6  =  12. 

Panel  18A  shows  the  maximum  likelihood  (EM)  reconstruction.  The  “hot  spot” 
(tumor)  in  the  lung  is  apparent,  but  local  structure  is  difficult  to  distinguish.  Panel  18B 
shows  the  MMSE  reconstruction  with  0  estimated  at  0  =  4.56  after  four  steps  of  the  EM 
estimation  procedure  (5.8);  here  we  initialized  0°  =  6.0. 

FIGURE  18  IS  PLACED  AT  THE  END  OF  THE  TEXT. 


DIRECTIONS  FOR  FUTURE  RESEARCH.  (1)  Fast,  Direct  Parameter  Esti¬ 
mators.  The  smoothness  of  the  reconstructions  is  sensitive  to  the  pivotal  parameter  (3. 
Controlled  experiments,  using  samples  from  the  Gibbs  prior  with  known  (3,  indicate  that 
EM  (via  equation  (5.8))  is  an  effective  procedure  for  estimating  0.  However,  it  is  very 
computationally  intensive.  We  are  continuing  to  study  an  alternative  method  of  moments 
estimator.  The  goal  is  to  have  a  direct  estimation  method  for  0  that  can  be  applied  to  the 
observable  Y  without  requiring  intermediate  reconstruction  of  X .  We  construct  a  statistic 
M(Y)  based  on  the  notion  that  the  smoothness  of  Y  will  reflect  the  magnitude  of  0  in  the 
same  way  that  the  smoothness  of  X  does. 


As  an  example  consider  the  following  statistic,  which  we  intend  to  study  further. 
For  the  detector  bin  at  angle  0*  and  at  sampling  step  Oj,  denote  t  =  (o j,0t)  and  t+  = 
(oJ+l,0fc),j  =  1,...,  L  -  1,  k  =  l,...,n.  Also,  introduce  the  notation  a(t)  =  (Arl)(t),  where 
1  is  the  vector  with  components  identically  equal  to  one;  o(f)  is  simply  the  row-sum  of  Aj 
associated  with  the  detector  at  location  t.  Then  define  the  moment  statistic 

flL.Jjl.Y-Jj _ 111. 

\a(t)  a(t+)J  a2(t)  a2(t+) 


(5.9) 


„  (t/2)  +  R 

"<n  =  £  £ 

fc=l  ,=<L/J)-ftl 


In  the  limits  on  the  summation  R  is  a  radius  of  a  window  on  the  detector  array  over  which 
the  summands  are  accumulated.  The  expectation  of  M(Y),  for  given  X ,  is  a  measure  of 
roughness  of  normalized  ART  projections  of  X: 


E(M(Y) \X)  =  £Er’ 


AtXM 

a(t) 


ATX(l+) 

a(t+) 


We  anticipate  that  the  expectation  £/j(Af(y)]  with  respect  to  the  prior  will  have  the  same 
general  behavior  as  Eg\V(X))  in  (5.6).  Accordingly,  we  define  the  moment  estimate  0m  of 
13  as  the  root  of  the  equation 


(5.10) 


E0[M(Y)}=  M(Y). 


The  effort  to  compute  0m  is  trivial,  once  the  left-hand  side  of  (5.10)  is  known  as  a  function 
of  /?. 


We  have  constructed  the  curve  describing  Ep[M(Y )]  using  the  same  simulated  X-data 
that  generated  Ep[V(X)]  in  Figure  15.  Figure  17  shows  the  resulting  curve;  it  does,  indeed, 
exhibit  the  same  qualitative  behavior  as  the  curve  in  Figure  15.  We  plan  to  test  this  and 
other  moment  estimators  on  real  and  simulated  data,  in  order  to  develop  an  accurate  and 
computationally  effecient  estimator  for  (3. 


EXPECTED  1I(Y)  OVER  CENTRAL  21  PROJECTION  BINS 

(  DC  L  TA—  1  SC  ) 


Figure  17. 


Our  initial  experiments  with  the  moment  estimator  suggest  that  the  statistic  M(-)  is 
not  robust  as  one  would  like  with  respect  to  certain  kinds  of  boundary  singularities  in  X. 
We  intend  to  study  alternatives  to  M  in  which  the  summands  are  less  sensitive  to  large  local 
variations  than  are  the  quadratic  terms  in  (5.9).  Analytical  difficulties  are  compounded, 
however,  for  the  robust  alternatives. 

(2)  Improvements  of  the  Degradation  Model.  The  model  presented  here  includes  the 
predominant  physical  effects.  But  we  have  assumed  that  other  potentially  significant  effects, 


34 


such  as  photon  scattering  and  background  radiation,  to  be  negligible.  Further,  we  have  not 
included  effects  from  the  sensor,  such  as  imperfect  collimation,  blurring,  and  noise.  It  has 
been  established  that  these  effects  can  contribute  significantly  to  the  degradation  (see,  for 
example,  [19]).  We  note,  however,  that  the  reconstruction  methods  described,  since  they  are 
based  on  the  generally  applicable  principles  of  maximum  likelihood  and  Bayes  optimality,  are 
adaptable  to  models  incorporating  additional  physical  and  sensor  effects.  We  are  currently 
studying  extensions  for  scattering  and  collimation  errors.  These  extensions  will  significantly 
increase  the  effective  size  of  the  ART,  and  thereby  present  difficult  computational  challenges. 

(3)  Alternative  Methods  of  Noninvasive  Testing.  We  intend  to  study  extensions 
of  our  Bayesian  reconstruction  method  to  other  modalities  of  noninvasive  imaging  and 
nondestructive  testing.  Currently  we  are  initiating  studies  with  ultrasonic  imagery. 

6.  GLOBAL  IMAGE  ANALYSIS 

Our  model-based  approach  to  image  processing  has  its  roots  in  a  general  pattern  theory 
layed  out  by  Grenander  [32].  In  our  discussion  here  of  global  image  analysis  we  shall  adopt 
the  terminology  and  notation  developed  for  that  theory,  which  is  especially  well  suited  for 
describing  global  geometric  attributes.  There  is  an  equivalent,  but  less  natural,  formulation 
in  the  statistical  mechanics/MRF  language  used  in  the  discussion  above. 

AIM.  Our  approach  is  model  based.  Starting  from  a  mathematical  model  of  the  patterns 
(image  ensemble)  that  we  are  dealing  with  we  derive  algorithms  for  pattern  inference  from 
the  models  by  appealing  to  first  principles. 

We  are  primarily  concerned  with  patterns  exhibiting  a  high  degree  of  variability,  such 
as  those  appearing  in  biology/medicine.  It  is  then  natural  to  account  for  the  variability  by 
probabilistic  models. 

A  situation  of  obvious  practical  importance,  as  well  as  scientifically  challenging,  is 
when  we  know  in  advance  that  the  image  contains  an  object  of  known  type  but  of  unknown 
shape.  We  may  know  for  example  that  the  image  contains  a  kidney,  a  heart,  a  leaf,  and  so 
on,  and  we  seek  a  precise  characterization  of  the  object:  object  identification. 

The  purpose  may  be  to  detect  anomalies  or  to  measure  certain  characteristics  on  the 
globally  estimated  object.  What  matters  then  is  the  variability  in  the  object  shape. 

We  desire  an  accurate  characterization  even  though  the  picture  may  be  very  noisy.  To 
get  that  we  appeal  to  the  prior  knowledge  we  have  of  the  variation  in  shape  for  the  object. 


It  is  dear  that  global  models  are  needed  for  such  a  task,  it  is  a  matter  of  high  level 
image  analysts. 

GENERAL  APPROACH.  We  have  long  employed  methods  from  metric  pattern  theory 
to  formulate  and  analyze  the  random  variation  of  patterns.  An  early  realization  of  this  idea 
was  given  in  Grenander  [30],  and  was  applied  to  pictorial  patterns  in  Gronander  [31],  pp. 
40-51. 

The  resulting  probability  measures  are  known  as  regularity  controlled  probabilities, 
or  Gibbs  measures  on  configuration  spaces,  or  incompletely  observed  Markov  processes  on 
graphs.  The  underlying  prindple  is  that  they  attempt  to  catch  variability  by  relaxing 
regularity  in  the  patterns. 

Recently  this  was  generalized  and  studied  in  greater  depth  in  Grenander- Keenan  [34] 
in  the  following  way.  Starting  from  primitive  geometric  objects,  the  generators  g,  in  some 
space  G,  we  combine  them  into  configurations 

(6.1)  c  =  o(g0,g1,...,gn-i)eC(Tl) 

in  a  configuration  space.  Here  a  denotes  a  connector  graph  with  n  sites,  and  regularity 
constraints  are  imposed  on  any  generators  connected  directly  by  a  segment  in  the  graph; 
the  generator  gt  is  situated  at  site  t  of  a. 

We  start  with  one  or  several  templates  not  necessarily  with  the  same  connector 
graph.  For  a  given  group  S ,  the  similarity  group,  with  elements  generically  denoted  by  s, 
we  define  a  measure  on  Sn  by 

.  n-l  n— 1 

(6-2)  p(s 0,3l,.  •  •  ,3„_i)  =  —  Qj(Sj) 

"  i=0  i=0 

where  p  is  understood  as  a  Radon-Nikodym  derivative  with  respect  to  a  fixed  measure  on 
5"  ,  usually  a  Lebesgue  or  counting  measure.  The  acceptor  functions  A,-  express  the  cou¬ 
pling  between  contiguous  group  elements  s,- — group  elements  that  are  compatible  with  the 
connector  graph  a — and  the  weight  functions  Qi  control  the  frequencies  of  their  occurrence. 

Applying  the  stochastic  operators  to  the  templates 
(5-8)  eM -(*,*, . a„-i)c<*'>, 

but  restricted  to  the  regular  configuration  space  0(71),  will  define  a  probability  measure  on 
C(H).  A  rigorous  study  of  how  this  is  done  can  be  found  in  Grenander- Keen  an  [34]  and  in 
Grenander  [32]  (Volume  III,  Chapter  3). 
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To  make  the  above  concrete  let  us  consider  some  examples.  For  pictorial  patterns  in 
the  plane  we  can  choose,  for  example,  generators  as  points  in  R2  and  S  as  the  translation 
group.  Or  we  can  let  the  generators  be  sides  in  polygons  (or  splines  with  other  arc  elements) 
and  S  the  group  of  uniform  scale  change.  Or,  with  the  same  generators,  we  can  let  S  be 
the  affine  group  in  the  plane,  and  so  on. 

In  these  examples  the  obvious  choice  for  the  connector  graph  is  o  G  CYCLIC ,  but  al¬ 
ternatives  with  ‘transversal’  bonds  are  also  of  interest  for  generating  patterns  with  (relaxed) 
symmetry,  for  example  of  bilateral  form. 

We  believe  that  the  creation  and  mathematical  analysis  of  such  random  shape  models 
are  fundamental  for  achieving  the  tasks  of  global  image  analysis.  Once  the  models  are 
well  understood  and  empirically  supported  we  can  apply  methods  of  statistical  inference  to 
them. 

SPECIAL  RESULTS.  We  have  carried  out  pattern  inference  in  2-D  as  outlined  above  in 
a  number  of  instances. 

Simulated  Pictures.  To  test  the  methodology,  image  reconstruction  was  implemented 
based  on  random  shape  models.  The  result  compare  favorably  with  local  methods.  Some 
examples  are  given  in  Grenander  [33]  and  Grenander-Keenan  [34]. 

Leaf  Shapes.  Simulated  data  are  good  for  testing  methods  and  software,  but  are  not 
enough.  It  is  necessary  to  try  the  methods  on  real  data.  We  have  therefore  also  applied 
the  same  methodology  to  a  type  of  biological  patterns,  namely  the  shape  of  leaves  from 
trees.  The  data  collection  phase  has  been  completed,  as  well  as  the  development  of  software 
for  handling  images  obtained  by  hardware  we  have  acquired  recently.  Algorithms,  both 
for  image  restoration  and  object  identification,  have  been  derived  and  some  preliminary 
experiments  have  been  carried  out.  This  work  with  A.  Knoerr  is  still  in  progress. 

Limit  Theorems.  We  have  used  our  method  of  stochastic  relaxation  extensively  in  our 
computer  experiments.  The  power  of  it  resides  in  its  general  applicability,  but  we  pay  a 
price  for  its  universality:  it  often  requires  massive  amounts  of  CPU  time.  This  is  especially 
true  in  3-D;  see  below. 

We  have  tried  to  compensate  for  this  by  analytical  means.  It  has  been  known  for  sev¬ 
eral  years  that,  in  special  cases,  and  when  stochastic  relaxation  is  slowest  (strong  couplings, 
large  configurations),  the  Markov  processes  on  graphs  can  be  approximated  by  Gaussian 
processes;  see  Chow-Grenander  [13]. 

More  recently  it  has  been  shown  that  such  limit  theorems  can  be  proved  in  more 
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general  situations,  Chow  [12]  and  Grenander-Sethuraman  [35].  Since  Gaussian  processes 
are  easily  simulated  this  makes  it  possible  to  reduce  the  CPU-time  requirement  drastically. 

Fitting  the  Models.  In  order  that  our  analyses  be  practical  the  model  must  be  anchored 
empirically.  In  particular  their  parameters  must  be  estimated  from  data.  In  the  random 
shape  models  we  do  not  have  statistical  homogeneity:  generators  at  different  sites  may  have 
quite  different  statistical  properties.  We  must  therefore  study  estimation  based  on  a  sample 
of  several  pictures.  This  has  been  done  in  Osborn  [51]. 

3-D  Experiments.  We  have  tried  our  models  in  3-D  situations,  so  far  only  for  pattern 
synthesis.  In  this  case  we  used  stochastic  relaxation  in  a  straightforward  manner,  since 
we  do  not  yet  have  access  to  the  sort  of  limit  theorems  for  3-D  mentioned  above.  The 
computing  is  so  heavy  that  it  was  necessary  to  use  a  supercomputer,  a  CRAY  XMP  48. 

DIRECTIONS  FOR  FUTURE  RESEARCH.  Based  on  the  methodology  we  have 
developed  and  the  the  experience  we  have  gained  in  applying  them,  we  shall  proceed  to 
more  challenging  tasks  in  global  image  analysis. 

With  priors  of  the  form  (6.2)  and  denoting  the  resulting  random  image  by  /  we  relate 
I  to  the  observed  digital  Iv  by  a  stochastic  deformation  mechanism  V 

V  :1  —>lv 

mapping  the  pure  image  algebra  2  =  R2  (where  R  is  the  identification  rule,  see  Grenander 
[32],  Volume  I,  Secton  3.1)  into  the  deformed  one.  Before  discussing  the  form  of  V  it  is 
necessary  to  mention  some  difficulties  that  arise  as  we  go  from  2-D  to  3-D. 

As  generators  we  can  choose  vertices  or  edges  or  faces  of  polyhedra  (or  other  spline 
surfaces):  all  three  possibilities  will  be  examined.  We  shall  choose  the  connector  graph  s 
for  the  template  (or  templates)  as  the  vertices  of  a  regular  icosahedron  further  refined  by 
successive  triangulations,  using  equilaterals.  This  results  in  a  regular  distribution  of  points 
on  the  template  surface  assumed  to  be  topologically  equivalent  to  the  sphere,  and  avoids 
the  singular  behavior  of  o  at  the  poles  if  we  had  used  a  standard  longitude-latitude  grid. 
The  successive  triangulations  are  useful  for  implementing  the  inference  algorithms  which 
will  be  hierarchic:  starting  with  crude  templates  they  will  be  successively  refined. 

Global  3-D  Inference.  Extending  what  we  did  in  R2  to  R3  we  have  to  simulate  the 
posterior 

(6.5)  p(c  |  Iv)  <x  p{c)p(I  =  Rc  — »  Iv) 

where  the  last  factor  is  due  to  V. 


We  shall  investigate  three  models  for  the  deformation  V: 

a)  visible  light 

b)  range  data 

c)  ultrasound. 

We  already  have  the  hardware  and  some  of  the  software  for  a).  At  present  we  do 
not  have  any  sensors  for  b)  and  c)  and  will  have  to  operate  with  simulated  data  during  the 
stage  when  we  fix  the  models  and  derive  inference  algorithms. 

This  is  not  sufficient,  however,  we  should  also  test  the  methods  in  a  real  world  setting. 
We  shall  therefore  try  to  get  access  either  to  sensors  for  b)  and  c),  or,  at  least  to  data 
obtained  under  realistic  conditions.  We  do  have  good  sources  for  echocardiographic  data 
(c),  in  particular. 

To  simulate  (6.5)  we  shall  try  to  avoid  straightforward  stochastic  relaxation  with  single 
site  updating  since  this  would  be  computationally  awkward.  Instead  we  appeal  to  pattern 
theoretic  limit  theorems  of  the  type  discussed  above,  so  that  we  can  simulate  conditional 
probabilities  for  subsets  of  the  configuration  c  with  more  than  a  single  site.  As  a  matter 
of  fact  we  may  be  able  to  simulate  the  whole  configuration  using  (6.2)  combined  with  the 
factor  p(c  ->  Iv).  This  appears  to  be  computationally  feasible. 

Note  that  in  this  way  we  make  global  inferences  about  the  image  that  incorporate 
prior  knowledge  about  the  whole  surface.  A  number  of  analytical  problems  have  to  be 
resolved,  however,  before  this  ambitious  goal  can  be  achieved. 

The  Correspondence  Problem.  Among  these  we  mention  in  particular  the  correspon¬ 
dence  problem.  Consider  the  boundary  described  by  the  configuration  c  and  a  boundary  c* 
obtained  from  Jv.  Say  that  c*  consists  of  a  set  of  points  zv,  for  u  =  1,2, . . . ,  A,  that  have 
arisen  from  some  points  on  c.  If  we  knew  the  („  we  could  calculate,  at  least  in  principle, 
the  likelihood  that  V  carries  the  {£„}  into  {zv}. 

The  trouble  is  that  this  correspondence  is  not  known  to  us.  Instead  we  will  have  to 
treat  the  C,  as  nuisance  parameters  that  have  to  be  estimated  during  the  inference.  In  2-D 
this  can  be  done  with  reasonable  computing  effort  by  a  dynamic  programming  algorithm. 
In  3-D  we  do  not  yet  have  any  fast  algorithm  for  dealing  with  the  correspondence  problem. 

This  is  for  the  case  when  the  inference  is  based  on  an  ‘empirical  boundary’  c* .  If, 
instead,  we  use  only  contrast  based  inference,  as  is  natural  for  low  signal-to-noise  ratios, 
the  correspondence  problem  does  not  appear. 
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Generator  Estimation.  We  do  not  avoid  it  entirely,  though,  since  we  meet  it  during  an 
earlier  phase  when  the  template  surface  (or  curve  in  2-D)  is  estimated  from  data.  Again 
we  do  not  know  the  correspondence.  We  plan  to  deal  with  this  difficulty  by  extending  a 
method  for  template,  or  generator,  estimation  that  we  have  developed,  but  not  yet  tested, 
in  R2. 

The  Icosahedral  Process.  Another  analytical  topic  that  has  to  be  studied  is  what  are 
the  properties  of  the  priors  to  be  used  in  3-D  for  the  random  template  surface.  To  fix  ideas 
say  that  we  choose  edges  as  generators.  The  measure  in  (6.2)  can  be  viewed  as  a  Markov 
process  on  the  triangulated  icosahedral  graph  taking  values  in  a  low  dimensional  Lie  group 
5.  Applying  the  Sj-elements  to  the  edges  g °  of  the  template  we  must  have  conditions 
satisfied  in  order  that  are  consistent 

(6.6)  LCy cie  =  'y  ^  st9i  ~  0 

cycle 

for  all  closed  cycles  of  the  graph  o.  We  can  find  p  independent  cycles,  where  p  is  the 
cyclomatic  number  for  the  graph,  by  a  constructive  procedure  starting  from  a  minimal 
spanning  tree  (see  for  example,  Giblin  [26]).  But,  since  the  number  of  constraints  will  be 
large  in  3-D  in  contrast  to  2-D,  we  need  a  systematic  and  computationally  feasible  way  of 
constraining  the  probability  measure  of  the  <7j’s. 

We  should  also  extend  the  limit  theorems  for  Markov  processes  on  graphs  described 
above  to  this  sort  of  graph.  The  limiting  Gaussian  processes  are  believed  to  have  covariance 
functions  that  are  Green’s  functions  of  differential  operators  simply  related  to  the  neigh¬ 
borhood  structure  of  the  graph,  but  this  statement  should  be  made  precise  and  proved. 

Ergodicity.  In  situations  where  the  limit  theorems  do  not  apply  (for  example  with  weak 
couplings)  we  are  forced  to  employ  stochastic  relaxation.  To  justify  stochastic  relaxation 
we  have  to  prove  ergodicity  (in  time)  of  a  Markov  chain  with  a  continuous  state  space.  This 
reduces,  in  each  special  case,  to  an  “elementary”  problem  in  plane  geometry. 

As  an  example  consider  the  configuration  space  C  of  all  closed  and  non-self-intersecting 
polygons  in  R2  (similarly  in  R3)  whose  sides  have  given  directions  but  arbitrary  lengths. 
Denote  the  lengths  by  li,h,  ■  ■  ■  ,ln  so  that  we  can  represent  a  configuration  as  c  = 
(/i,/2,...,/n)-  The  ergodicity  mentioned  can  be  established  by  proving  that  any  two  points 

(6.7)  Cj,C2  E  interior(C)  C  R+-2 

can  be  joined  by  a  continuous  curve  passing  through  the  interior  of  C. 


This  innocuous  proposition  has  not  yet  been  proved  in  spite  of  several  attempts  and 
discussions  with  a  number  of  geometers.  Many  versions  of  it  appear,  for  different  updating 
schemes,  but  only  some  have  been  proved  so  far. 

Software.  We  need  software,  for  the  object  identification  algorithms,  but  also  for  the  data 
analysis  of  images  that  will  precede  and  accompany  the  analysis  of  the  models  to  be  used. 

During  the  data  analysis  stage  computing  efficiency  is  not  as  crucial  as  for  the  inference 
algorithms.  We  have  started  to  develop  data  analysis  programs  in  APL  and  plan  to  do  this 
for  the  following  tasks. 

a)  local  Karhunen-Loeve  analysis 

b)  smoothing  programs,  linear,  median,  and  trimmed  mean  filtering 

c)  primitive  operations  on  surfaces:  rotate  around  an  axis,  stretch,  turn  around  an  axis, 
push  and  pull  the  surface  at  a  point,  all  for  icosahedral/tri angulation  vertices 

d)  display  template  surfaces,  I  ,  and  the  object  identified  by  our  inference  algorithms. 

e)  simulation  of  range  data  and  ultrasound  data 

f)  standard  image  operations  like  edge  detection. 

This  code  should  be  easy  to  write  and  will  be  useful  for  data  analysis. 


FIGURES. 


Figure  Legends. 

Figure  2.  Wood  on  plastic  background. 

Figure  3.  Carpet  on  plastic  background. 

Figure  4.  Wood,  carpet,  and  cloth  on  plastic  background. 
Figure  10.  Synthetic  aperture  radar  of  water  and  ice. 
Figure  11.  Brodatz  collage  with  four  textures. 

Figure  12.  House  scene. 

Figure  16.  Phantom  isotope  density  and  reconstructions. 
Figure  18.  Lung  cross-section  and  reconstructions. 
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Figure  2.  Wood  on  plastic  background. 


Figure  3.  Carpet  on  plastic  background. 


s 

=5 

S 

-tty 

S 

r^i 

SI 

i 

S 

S 

s 

S 

s 

Figure  11.  Brodatz  collage  with  four  textures. 


Figure  16E.  Phantom  isotope  density  and  reconstructions  (cont). 
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